2011/8/27 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>: > Hmm, it looks there is no error. What happens if you run the laptop3 > first, and run the laptop1 and laptop2 next? The two node can join > Sheepdog correctly? No. I run 'collie cluster shutwdown' from laptop2 (the last that was still on). I stopped corosync on all 3 nodes. I started corosync and sheep on node3 and it gave me the same message. I run corosync and sheep on node2 and node1. On node2 I get no outptup from 'sheep cluster info'. >From node1 I get # collie cluster info Waiting for a format operation Ctime Epoch Nodes 1970-01-01 01:00:00 0 [] Node1 has lots of message in /mnt/sheepdog/sheepdog.log list these: ... Aug 29 12:09:47 find_tgt_node(1107) 26, 64, 54, 128, 0 Aug 29 12:09:47 find_tgt_node(1146) 26, 0, 54 Aug 29 12:09:47 __recover_one(1277) rename /mnt/sheepdog//obj/00000004/00a34c67000008a6.tmp to /mnt/sheepdog//obj/00000004/00a34c67000008a6 Aug 29 12:09:47 __recover_one(1283) recovered oid a34c67000008a6 to epoch 4 Aug 29 12:09:47 recover_one(1340) 546 1140, a34c6700000067 Aug 29 12:09:47 ob_open(491) failed to open /mnt/sheepdog//obj/00000004/00a34c6700000067, No such file or directory Aug 29 12:09:47 recover_one(1397) 54, 2, 0 Aug 29 12:09:47 __recover_one(1176) recover obj a34c6700000067 from epoch 3 Aug 29 12:09:47 find_tgt_node(1107) 26, 64, 54, 128, 0 Aug 29 12:09:47 find_tgt_node(1146) 26, 0, 54 Aug 29 12:09:47 __recover_one(1277) rename /mnt/sheepdog//obj/00000004/00a34c6700000067.tmp to /mnt/sheepdog//obj/00000004/00a34c6700000067 Aug 29 12:09:47 __recover_one(1283) recovered oid a34c6700000067 to epoch 4 Aug 29 12:09:47 recover_one(1340) 547 1140, a34c67000008a3 Aug 29 12:09:47 ob_open(491) failed to open /mnt/sheepdog//obj/00000004/00a34c67000008a3, No such file or directory Aug 29 12:09:47 recover_one(1397) 54, 2, 0 Aug 29 12:09:47 __recover_one(1176) recover obj a34c67000008a3 from epoch 3 Aug 29 12:09:47 find_tgt_node(1107) 26, 64, 54, 128, 0 Aug 29 12:09:47 find_tgt_node(1146) 26, 0, 54 Aug 29 12:09:47 __recover_one(1277) rename /mnt/sheepdog//obj/00000004/00a34c67000008a3.tmp to /mnt/sheepdog//obj/00000004/00a34c67000008a3 Aug 29 12:09:47 __recover_one(1283) recovered oid a34c67000008a3 to epoch 4 Aug 29 12:09:47 recover_one(1340) 548 1140, a34c670000086e Aug 29 12:09:47 ob_open(491) failed to open /mnt/sheepdog//obj/00000004/00a34c670000086e, No such file or directory Aug 29 12:09:47 recover_one(1397) 54, 2, 0 Aug 29 12:09:47 __recover_one(1176) recover obj a34c670000086e from epoch 3 Aug 29 12:09:47 find_tgt_node(1107) 26, 64, 54, 128, 0 Aug 29 12:09:47 find_tgt_node(1146) 26, 0, 54 Aug 29 12:09:48 __recover_one(1277) rename /mnt/sheepdog//obj/00000004/00a34c670000086e.tmp to /mnt/sheepdog//obj/00000004/00a34c670000086e Aug 29 12:09:48 __recover_one(1283) recovered oid a34c670000086e to epoch 4 Aug 29 12:09:48 recover_one(1340) 549 1140, a34c67000007e6 Aug 29 12:09:48 ob_open(491) failed to open /mnt/sheepdog//obj/00000004/00a34c67000007e6, No such file or directory Aug 29 12:09:48 recover_one(1397) 56, 2, 1 Aug 29 12:09:48 __recover_one(1176) recover obj a34c67000007e6 from epoch 3 Aug 29 12:09:48 find_tgt_node(1107) 27, 64, 56, 128, 1 Aug 29 12:09:48 find_tgt_node(1146) 4294967295, 1, 57 Aug 29 12:09:48 __recover_one(1181) cannot find target node, a34c67000007e6 Aug 29 12:09:48 __recover_one(1176) recover obj a34c67000007e6 from epoch 3 Aug 29 12:09:48 find_tgt_node(1107) 27, 64, 56, 128, 0 Aug 29 12:09:48 find_tgt_node(1114) 27, 0, 56, 128 ... Node2 has lots of messages in /mnt/sheepdog/sheepdog.log like these: ... Aug 24 12:07:12 __recover_one(1176) recover obj a34c6700000040 from epoch 1 Aug 24 12:07:12 find_tgt_node(1107) 29, 64, 59, 128, 0 Aug 24 12:07:12 find_tgt_node(1146) 29, 0, 59 Aug 24 12:07:12 __recover_one(1277) rename /mnt/sheepdog//obj/00000002/00a34c6700000040.tmp to /mnt/sheepdog//obj/00000002/00a34c6700000040 Aug 24 12:07:12 __recover_one(1283) recovered oid a34c6700000040 to epoch 2 Aug 24 12:07:12 recover_one(1340) 584 1140, a34c67000008f8 Aug 24 12:07:12 ob_open(491) failed to open /mnt/sheepdog//obj/00000002/00a34c67000008f8, No such file or directory Aug 24 12:07:12 recover_one(1397) 59, 2, 0 Aug 24 12:07:12 __recover_one(1176) recover obj a34c67000008f8 from epoch 1 Aug 24 12:07:12 find_tgt_node(1107) 29, 64, 59, 128, 0 Aug 24 12:07:12 find_tgt_node(1146) 29, 0, 59 Aug 24 12:07:12 __recover_one(1277) rename /mnt/sheepdog//obj/00000002/00a34c67000008f8.tmp to /mnt/sheepdog//obj/00000002/00a34c67000008f8 Aug 24 12:07:12 __recover_one(1283) recovered oid a34c67000008f8 to epoch 2 Aug 24 12:07:12 recover_one(1340) 585 1140, a34c6700000949 Aug 24 12:07:12 ob_open(491) failed to open /mnt/sheepdog//obj/00000002/00a34c6700000949, No such file or directory Aug 24 12:07:12 recover_one(1397) 60, 2, 1 Aug 24 12:07:12 __recover_one(1176) recover obj a34c6700000949 from epoch 1 Aug 24 12:07:12 find_tgt_node(1107) 29, 64, 60, 128, 1 Aug 24 12:07:12 find_tgt_node(1146) 4294967295, 1, 61 Aug 24 12:07:12 __recover_one(1181) cannot find target node, a34c6700000949 Aug 24 12:07:12 __recover_one(1176) recover obj a34c6700000949 from epoch 1 Aug 24 12:07:12 find_tgt_node(1107) 29, 64, 60, 128, 0 Aug 24 12:07:12 find_tgt_node(1114) 29, 0, 60, 128 Aug 24 12:07:12 __recover_one(1277) rename /mnt/sheepdog//obj/00000002/00a34c6700000949.tmp to /mnt/sheepdog//obj/00000002/00a34c6700000949 Aug 24 12:07:12 __recover_one(1283) recovered oid a34c6700000949 to epoch 2 Aug 24 12:07:12 recover_one(1340) 586 1140, a34c670000062b Aug 24 12:07:12 ob_open(491) failed to open /mnt/sheepdog//obj/00000002/00a34c670000062b, No such file or directory Aug 24 12:07:12 recover_one(1397) 60, 2, 1 .... Node3 has some ... Aug 29 12:06:51 cpg_event_done(1316) 0x37d8890 Aug 29 12:06:51 __sd_deliver_done(980) op: 1, state: 1, size: 32840, from: ::c0a8:21b:581b:4000:0:0:2 Aug 29 12:06:51 send_join_response(939) 3409 453159104 Aug 29 12:06:51 join(484) joining node send a wrong version message Aug 29 12:06:51 cpg_event_done(1370) free 0x37d8890 Aug 29 12:06:51 sd_deliver(1012) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:1, nodeid: 990030016, pid: 7976 Aug 29 12:06:51 sd_deliver(1021) allow new deliver, 0x37d8890 Aug 29 12:06:51 start_cpg_event_work(1448) 0 1 Aug 29 12:06:51 cpg_event_fn(1280) 0x37d8890, 1 2 Aug 29 12:06:51 cpg_event_fn(1294) 3 Aug 29 12:06:51 __sd_deliver(861) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:1, pid: 3409 Aug 29 12:06:51 cpg_event_done(1316) 0x37d8890 Aug 29 12:06:51 __sd_deliver_done(980) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:1 Aug 29 12:06:51 cpg_event_done(1370) free 0x37d8890 Aug 29 12:06:51 sd_deliver(1012) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:2, nodeid: 520267968, pid: 21674 Aug 29 12:06:51 sd_deliver(1021) allow new deliver, 0x37d8890 Aug 29 12:06:51 start_cpg_event_work(1448) 0 1 Aug 29 12:06:51 cpg_event_fn(1280) 0x37d8890, 1 2 Aug 29 12:06:51 cpg_event_fn(1294) 3 Aug 29 12:06:51 __sd_deliver(861) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:2, pid: 3409 Aug 29 12:06:51 cpg_event_done(1316) 0x37d8890 Aug 29 12:06:51 __sd_deliver_done(980) op: 1, state: 3, size: 32840, from: ::c0a8:21b:581b:4000:0:0:2 Aug 29 12:06:51 cpg_event_done(1370) free 0x37d8890 Aug 29 12:10:17 listen_handler(523) accepted a new connection, 11 Aug 29 12:10:17 start_cpg_event_work(1448) 0 2 Aug 29 12:10:17 cluster_queue_request(255) 0x37d8990 87 Aug 29 12:10:17 client_handler(484) closed a connection, 11 Aug 29 12:13:08 listen_handler(523) accepted a new connection, 11 Aug 29 12:13:08 start_cpg_event_work(1448) 0 2 Aug 29 12:13:08 cluster_queue_request(255) 0x37d8960 87 Aug 29 12:13:08 client_handler(484) closed a connection, 11 ... |