> I sent a patch to show a correct output of 'collie cluster info' > without segfault. Can you try it out? I went ahead and pulled down "77f26b4" as I was using "3a2801b" for my testing. > From your log messages, it looks like node174 stores a higher epoch. > I think if you run a sheep daemon on node174 first, Sheepdog would > work again. I had already tried starting node174 first, but with the new code, at least "collie cluster info" doesn't segfault anymore: [root at node174 ~]# collie cluster info Cluster status: Waiting for other nodes joining Creation time Epoch Nodes 2011-09-15 20:21:18 17 [192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 16 [192.168.0.157:7000, 192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 15 [192.168.0.156:7000, 192.168.0.157:7000, 192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 14 [192.168.0.156:7000, 192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 13 [192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 12 [192.168.0.156:7000, 192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 11 [192.168.0.156:7000, 192.168.0.157:7000, 192.168.0.173:7000, 192.168.0.174:7000] 2011-09-15 20:21:18 10 [192.168.0.156:7000, 192.168.0.173:7000, 192.168.0.174:7000] But I still can't get the other nodes to join. Here is the sheep.log from node174: Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000001 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000002 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000003 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000004 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000005 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000006 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000007 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000008 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000009 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000010 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000011 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000012 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000013 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000014 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000015 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000016 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969400000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969500000000 Sep 19 11:08:43 init_epoch_path(1932) found the vdi obj, 80f5969600000000 Sep 19 11:08:43 init_epoch_path(1911) found the obj dir, /node/sheepdog/obj//00000017 Sep 19 11:08:43 jrnl_recover(2238) Openning the directory /node/sheepdog/journal/00000017/. Sep 19 11:08:43 worker_routine(206) started this thread 0 Sep 19 11:08:43 worker_routine(206) started this thread 0 Sep 19 11:08:43 worker_routine(206) started this thread 3 Sep 19 11:08:43 worker_routine(206) started this thread 0 Sep 19 11:08:43 worker_routine(206) started this thread 1 Sep 19 11:08:43 worker_routine(206) started this thread 0 Sep 19 11:08:43 worker_routine(206) started this thread 0 Sep 19 11:08:43 worker_routine(206) started this thread 1 Sep 19 11:08:43 worker_routine(206) started this thread 2 Sep 19 11:08:43 worker_routine(206) started this thread 2 Sep 19 11:08:43 worker_routine(206) started this thread 3 Sep 19 11:08:43 set_addr(1723) addr = 192.168.0.174, port = 7000 Sep 19 11:08:43 create_cluster(1778) zone id = 1 Sep 19 11:08:43 main(167) Sheepdog daemon (version 0.2.3) started Sep 19 11:08:43 sd_confchg(1621) confchg nodeid aed92998 Sep 19 11:08:43 sd_confchg(1623) 1 0 1 Sep 19 11:08:43 sd_confchg(1627) [0] node_id: aed92998, pid: 8646, reason: 0 Sep 19 11:08:43 sd_confchg(1641) allow new confchg, 0x254e020 Sep 19 11:08:43 start_cpg_event_work(1465) 0 0 Sep 19 11:08:43 cpg_event_fn(1279) 0x254e020, 0 2 Sep 19 11:08:43 cpg_event_done(1315) 0x254e020 Sep 19 11:08:43 __sd_confchg_done(1206) 8646 aed92998 Sep 19 11:08:43 update_cluster_info(683) l nodeid: aed92998, pid: 8646, ip: 192.168.0.174:7000 Sep 19 11:08:43 cpg_event_done(1373) free 0x254e020 Sep 19 11:09:38 sd_confchg(1621) confchg nodeid add92998 Sep 19 11:09:38 sd_confchg(1623) 2 0 1 Sep 19 11:09:38 sd_confchg(1627) [0] node_id: add92998, pid: 8097, reason: 1940777327 Sep 19 11:09:38 sd_confchg(1627) [1] node_id: aed92998, pid: 8646, reason: 6485728 Sep 19 11:09:38 sd_confchg(1641) allow new confchg, 0x254e020 Sep 19 11:09:38 start_cpg_event_work(1465) 0 0 Sep 19 11:09:38 cpg_event_fn(1279) 0x254e020, 0 2 Sep 19 11:09:38 cpg_event_done(1315) 0x254e020 Sep 19 11:09:38 __sd_confchg_done(1232) l nodeid: aed92998, pid: 8646, ip: 192.168.0.174:7000 Sep 19 11:09:38 cpg_event_done(1373) free 0x254e020 Sep 19 11:09:38 sd_deliver(987) op: 1, state: 1, size: 32840, from: 192.168.0.173:7000, nodeid: add92998, pid: 8097 Sep 19 11:09:38 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:09:38 start_cpg_event_work(1465) 0 1 Sep 19 11:09:38 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:09:38 cpg_event_fn(1293) 1 Sep 19 11:09:38 __sd_deliver(839) op: 1, state: 1, size: 32840, from: 192.168.0.173:7000, pid: 8097 Sep 19 11:09:38 cpg_event_done(1315) 0x254e1a0 Sep 19 11:09:38 __sd_deliver_done(955) op: 1, state: 1, size: 32840, from: 192.168.0.173:7000 Sep 19 11:09:38 get_cluster_status(440) sheepdog is waiting with newer epoch, 16 17 192.168.0.173:7000 Sep 19 11:09:38 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:09:39 sd_deliver(987) op: 1, state: 3, size: 32840, from: 192.168.0.173:7000, nodeid: aed92998, pid: 8646 Sep 19 11:09:39 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:09:39 start_cpg_event_work(1465) 0 1 Sep 19 11:09:39 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:09:39 cpg_event_fn(1293) 3 Sep 19 11:09:39 __sd_deliver(839) op: 1, state: 3, size: 32840, from: 192.168.0.173:7000, pid: 8097 Sep 19 11:09:39 cpg_event_done(1315) 0x254e1a0 Sep 19 11:09:39 __sd_deliver_done(955) op: 1, state: 3, size: 32840, from: 192.168.0.173:7000 Sep 19 11:09:39 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:09:58 sd_confchg(1621) confchg nodeid 9cd92998 Sep 19 11:09:58 sd_confchg(1623) 3 0 1 Sep 19 11:09:58 sd_confchg(1627) [0] node_id: 9cd92998, pid: 14918, reason: 0 Sep 19 11:09:58 sd_confchg(1627) [1] node_id: add92998, pid: 8097, reason: 0 Sep 19 11:09:58 sd_confchg(1627) [2] node_id: aed92998, pid: 8646, reason: 0 Sep 19 11:09:58 sd_confchg(1641) allow new confchg, 0x254e020 Sep 19 11:09:58 start_cpg_event_work(1465) 0 0 Sep 19 11:09:58 cpg_event_fn(1279) 0x254e020, 0 2 Sep 19 11:09:58 cpg_event_done(1315) 0x254e020 Sep 19 11:09:58 __sd_confchg_done(1232) l nodeid: aed92998, pid: 8646, ip: 192.168.0.174:7000 Sep 19 11:09:58 cpg_event_done(1373) free 0x254e020 Sep 19 11:09:58 sd_deliver(987) op: 1, state: 1, size: 32840, from: 192.168.0.156:7000, nodeid: 9cd92998, pid: 14918 Sep 19 11:09:58 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:09:58 start_cpg_event_work(1465) 0 1 Sep 19 11:09:58 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:09:58 cpg_event_fn(1293) 1 Sep 19 11:09:58 __sd_deliver(839) op: 1, state: 1, size: 32840, from: 192.168.0.156:7000, pid: 14918 Sep 19 11:09:58 cpg_event_done(1315) 0x254e1a0 Sep 19 11:09:58 __sd_deliver_done(955) op: 1, state: 1, size: 32840, from: 192.168.0.156:7000 Sep 19 11:09:58 get_cluster_status(440) sheepdog is waiting with newer epoch, 15 17 192.168.0.156:7000 Sep 19 11:09:58 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:09:58 sd_deliver(987) op: 1, state: 3, size: 32840, from: 192.168.0.156:7000, nodeid: aed92998, pid: 8646 Sep 19 11:09:58 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:09:58 start_cpg_event_work(1465) 0 1 Sep 19 11:09:58 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:09:58 cpg_event_fn(1293) 3 Sep 19 11:09:58 __sd_deliver(839) op: 1, state: 3, size: 32840, from: 192.168.0.156:7000, pid: 14918 Sep 19 11:09:58 cpg_event_done(1315) 0x254e1a0 Sep 19 11:09:58 __sd_deliver_done(955) op: 1, state: 3, size: 32840, from: 192.168.0.156:7000 Sep 19 11:09:58 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:10:04 sd_confchg(1621) confchg nodeid 9cd92998 Sep 19 11:10:04 sd_confchg(1623) 4 0 1 Sep 19 11:10:04 sd_confchg(1627) [0] node_id: 9cd92998, pid: 14918, reason: 0 Sep 19 11:10:04 sd_confchg(1627) [1] node_id: 9dd92998, pid: 8515, reason: 0 Sep 19 11:10:04 sd_confchg(1627) [2] node_id: add92998, pid: 8097, reason: 1940777327 Sep 19 11:10:04 sd_confchg(1627) [3] node_id: aed92998, pid: 8646, reason: 6485728 Sep 19 11:10:04 sd_confchg(1641) allow new confchg, 0x254e020 Sep 19 11:10:04 start_cpg_event_work(1465) 0 0 Sep 19 11:10:04 cpg_event_fn(1279) 0x254e020, 0 2 Sep 19 11:10:04 cpg_event_done(1315) 0x254e020 Sep 19 11:10:04 __sd_confchg_done(1232) l nodeid: aed92998, pid: 8646, ip: 192.168.0.174:7000 Sep 19 11:10:04 cpg_event_done(1373) free 0x254e020 Sep 19 11:10:04 sd_deliver(987) op: 1, state: 1, size: 32840, from: 192.168.0.157:7000, nodeid: 9dd92998, pid: 8515 Sep 19 11:10:04 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:10:04 start_cpg_event_work(1465) 0 1 Sep 19 11:10:04 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:10:04 cpg_event_fn(1293) 1 Sep 19 11:10:04 __sd_deliver(839) op: 1, state: 1, size: 32840, from: 192.168.0.157:7000, pid: 8515 Sep 19 11:10:04 cpg_event_done(1315) 0x254e1a0 Sep 19 11:10:04 __sd_deliver_done(955) op: 1, state: 1, size: 32840, from: 192.168.0.157:7000 Sep 19 11:10:04 get_cluster_status(440) sheepdog is waiting with newer epoch, 16 17 192.168.0.157:7000 Sep 19 11:10:04 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:10:04 sd_deliver(987) op: 1, state: 3, size: 32840, from: 192.168.0.157:7000, nodeid: aed92998, pid: 8646 Sep 19 11:10:04 sd_deliver(996) allow new deliver, 0x254e1a0 Sep 19 11:10:04 start_cpg_event_work(1465) 0 1 Sep 19 11:10:04 cpg_event_fn(1279) 0x254e1a0, 1 2 Sep 19 11:10:04 cpg_event_fn(1293) 3 Sep 19 11:10:04 __sd_deliver(839) op: 1, state: 3, size: 32840, from: 192.168.0.157:7000, pid: 8515 Sep 19 11:10:04 cpg_event_done(1315) 0x254e1a0 Sep 19 11:10:04 __sd_deliver_done(955) op: 1, state: 3, size: 32840, from: 192.168.0.157:7000 Sep 19 11:10:04 cpg_event_done(1373) free 0x254e1a0 Sep 19 11:10:10 listen_handler(613) accepted a new connection, 11 Sep 19 11:10:10 queue_request(211) 82 Sep 19 11:10:10 start_cpg_event_work(1465) 0 2 Sep 19 11:10:10 cluster_queue_request(261) 0x7f92a13fb010 82 Sep 19 11:10:10 client_handler(563) closed a connection, 11 Sep 19 11:10:13 listen_handler(613) accepted a new connection, 11 Sep 19 11:10:13 queue_request(211) 87 Sep 19 11:10:13 start_cpg_event_work(1465) 0 2 Sep 19 11:10:13 cluster_queue_request(261) 0x254e340 87 Sep 19 11:10:13 client_handler(563) closed a connection, 11 Thanks for your assistance with this |