MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes: > Yes, I pushed many patches which simplify cluster communications, so > the problem might be solved with the current master branch. Anyway, > I'll try to find what caused the problem. :) Hi Kazutaka. I pulled the current head of master, 5d8ab0de8e. I'm afraid this now breaks quite spectacularly when I first try to create a drive (which does a vdi create followed by a handful of vdi setattrs and getattrs) on my one host, three sheep minicluster: a setattr fails and then 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to read object, 800a8ac400000000 Remote node has a new epoch failed to read a inode header The three sheep.logs read Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0 Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7000 Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:34:16 cluster_queue_request(192) 0x7f9336afa010 84 Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0 Oct 24 10:34:16 update_cluster_info(559) status = 2, epoch = 0, 0, 0 Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 82 Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 11 Oct 24 10:34:17 do_lookup_vdi(238) looking for 75bd8c98-3c55-45d3-bc16-3af62601a3a5 36, a8ac4 Oct 24 10:34:17 add_vdi(327) we create a new vdi, 0 75bd8c98-3c55-45d3-bc16-3af62601a3a5 (36) 539545600, vid: a8ac4, base 0, cur 0 Oct 24 10:34:17 add_vdi(331) qemu doesn't specify the copies... 1 Oct 24 10:34:17 __sd_notify_done(733) done 0 690884 Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 82 Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 89 Oct 24 10:34:17 do_lookup_vdi(238) looking for 75bd8c98-3c55-45d3-bc16-3af62601a3a5 36, a8ac4 Oct 24 10:34:20 cluster_queue_request(192) 0x1c44cb0 82 Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0 Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7001 Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:34:17 update_cluster_info(559) status = 1, epoch = 1, 0, 0 Oct 24 10:34:17 check_epoch(1145) new node version 2 3 1 Oct 24 10:34:17 __sd_notify_done(733) done 0 690884 Oct 24 10:34:17 check_epoch(1145) new node version 2 3 2 Oct 24 10:34:21 check_epoch(1145) new node version 2 3 2 Oct 24 10:35:02 check_epoch(1145) new node version 2 3 2 Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0 Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7002 Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 0 Oct 24 10:34:17 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:34:17 __sd_notify_done(733) done 0 690884 On another run with a clean sheepdog, I get a successful vdi create, followed by vdi setattr -x 3eb42043-a142-4016-9a83-30e56101af23 lock <<< '002689c3-aeab-433d-bafc-acfb95dafe7c:16692:1319452195' returning exit status 1. The three sheep.logs show Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0 Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7000 Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:29:52 cluster_queue_request(192) 0x7f2b33bd9010 84 Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0 Oct 24 10:29:52 update_cluster_info(559) status = 2, epoch = 0, 0, 0 Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:29:53 __fill_obj_list(1657) try again, 0, 22 Oct 24 10:29:54 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:29:54 cluster_queue_request(192) 0x269ad20 11 Oct 24 10:29:54 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79 Oct 24 10:29:54 add_vdi(327) we create a new vdi, 0 3eb42043-a142-4016-9a83-30e56101af23 (36) 539545600, vid: a6dd79, base 0, cur 0 Oct 24 10:29:54 add_vdi(331) qemu doesn't specify the copies... 1 Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673 Oct 24 10:29:55 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:29:55 cluster_queue_request(192) 0x269ad20 89 Oct 24 10:29:55 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 89 Oct 24 10:30:02 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79 Oct 24 10:30:02 ob_open(449) failed to open /mnt/sheep-0026-00/obj/00000003/20a6dd797853c6e2, No such file or directory Oct 24 10:30:02 read_object(727) fail 20a6dd797853c6e2 -2 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 89 Oct 24 10:30:02 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79 Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:30:33 cluster_queue_request(192) 0x269ad20 82 Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0 Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7001 Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 0 Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673 Oct 24 10:29:55 check_epoch(1145) new node version 2 3 2 Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0 Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7002 Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7 Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 0 Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1 Oct 24 10:29:53 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:29:54 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:29:54 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/ Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673 Oct 24 10:30:02 ob_open(449) failed to open /mnt/sheep-0026-02/obj/00000003/20a6dd79886e13c4, No such file or directory Best wishes, Chris. |