[Sheepdog] [PATCH 2/2] make vdi setattr atomic

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Wed Oct 26 11:17:25 CEST 2011


At Mon, 24 Oct 2011 11:36:41 +0100,
Chris Webb wrote:
> 
> MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes:
> 
> > Yes, I pushed many patches which simplify cluster communications, so
> > the problem might be solved with the current master branch.  Anyway,
> > I'll try to find what caused the problem. :)
> 
> Hi Kazutaka. I pulled the current head of master, 5d8ab0de8e. I'm afraid
> this now breaks quite spectacularly when I first try to create a drive
> (which does a vdi create followed by a handful of vdi setattrs and getattrs)
> on my one host, three sheep minicluster: a setattr fails and then
> 
> 0026# collie vdi list
>   name        id    size    used  shared    creation time   vdi id
> ------------------------------------------------------------------
> failed to read object, 800a8ac400000000 Remote node has a new epoch
> failed to read a inode header
> 
> The three sheep.logs read
> 
> Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0
> Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7000
> Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:34:16 cluster_queue_request(192) 0x7f9336afa010 84
> Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0
> Oct 24 10:34:16 update_cluster_info(559) status = 2, epoch = 0, 0, 0
> Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 82
> Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 11
> Oct 24 10:34:17 do_lookup_vdi(238) looking for 75bd8c98-3c55-45d3-bc16-3af62601a3a5 36, a8ac4
> Oct 24 10:34:17 add_vdi(327) we create a new vdi, 0 75bd8c98-3c55-45d3-bc16-3af62601a3a5 (36) 539545600, vid: a8ac4, base 0, cur 0 
> Oct 24 10:34:17 add_vdi(331) qemu doesn't specify the copies... 1
> Oct 24 10:34:17 __sd_notify_done(733) done 0 690884
> Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 82
> Oct 24 10:34:17 cluster_queue_request(192) 0x1c44cb0 89
> Oct 24 10:34:17 do_lookup_vdi(238) looking for 75bd8c98-3c55-45d3-bc16-3af62601a3a5 36, a8ac4
> Oct 24 10:34:20 cluster_queue_request(192) 0x1c44cb0 82
> 
> Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0
> Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7001
> Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:34:17 update_cluster_info(559) status = 1, epoch = 1, 0, 0
> Oct 24 10:34:17 check_epoch(1145) new node version 2 3 1
> Oct 24 10:34:17 __sd_notify_done(733) done 0 690884
> Oct 24 10:34:17 check_epoch(1145) new node version 2 3 2
> Oct 24 10:34:21 check_epoch(1145) new node version 2 3 2
> Oct 24 10:35:02 check_epoch(1145) new node version 2 3 2
> 
> Oct 24 10:34:16 read_epoch(2036) failed to read epoch 0
> Oct 24 10:34:16 send_join_request(976) ip: 172.16.101.7, port: 7002
> Oct 24 10:34:16 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:34:16 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:34:16 update_cluster_info(559) status = 1, epoch = 1, 0, 0
> Oct 24 10:34:17 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:34:17 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:34:17 __sd_notify_done(733) done 0 690884
> 
> On another run with a clean sheepdog, I get a successful vdi create, followed
> by
> 
>   vdi setattr -x 3eb42043-a142-4016-9a83-30e56101af23 lock <<< '002689c3-aeab-433d-bafc-acfb95dafe7c:16692:1319452195'
> 
> returning exit status 1. The three sheep.logs show
> 
> Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0
> Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7000
> Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:29:52 cluster_queue_request(192) 0x7f2b33bd9010 84
> Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0
> Oct 24 10:29:52 update_cluster_info(559) status = 2, epoch = 0, 0, 0
> Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:29:53 __fill_obj_list(1657) try again, 0, 22
> Oct 24 10:29:54 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:29:54 cluster_queue_request(192) 0x269ad20 11
> Oct 24 10:29:54 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79
> Oct 24 10:29:54 add_vdi(327) we create a new vdi, 0 3eb42043-a142-4016-9a83-30e56101af23 (36) 539545600, vid: a6dd79, base 0, cur 0 
> Oct 24 10:29:54 add_vdi(331) qemu doesn't specify the copies... 1
> Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673
> Oct 24 10:29:55 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:29:55 cluster_queue_request(192) 0x269ad20 89
> Oct 24 10:29:55 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 89
> Oct 24 10:30:02 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79
> Oct 24 10:30:02 ob_open(449) failed to open /mnt/sheep-0026-00/obj/00000003/20a6dd797853c6e2, No such file or directory
> Oct 24 10:30:02 read_object(727) fail 20a6dd797853c6e2 -2
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 89
> Oct 24 10:30:02 do_lookup_vdi(238) looking for 3eb42043-a142-4016-9a83-30e56101af23 36, a6dd79
> Oct 24 10:30:02 cluster_queue_request(192) 0x269ad20 82
> Oct 24 10:30:33 cluster_queue_request(192) 0x269ad20 82
> 
> Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0
> Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7001
> Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 0
> Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673
> Oct 24 10:29:55 check_epoch(1145) new node version 2 3 2
> 
> Oct 24 10:29:52 read_epoch(2036) failed to read epoch 0
> Oct 24 10:29:52 send_join_request(976) ip: 172.16.101.7, port: 7002
> Oct 24 10:29:52 main(216) Sheepdog daemon (version 0.2.4-12-g737596b-dirty) started
> Oct 24 10:29:53 get_vdi_bitmap_from(502) get the vdi bitmap from 172.16.101.7
> Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 0
> Oct 24 10:29:53 update_cluster_info(559) status = 1, epoch = 1, 0, 1
> Oct 24 10:29:53 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:29:54 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:29:54 get_obj_list(133) /mnt/sheep-0026-02/obj/00000001/
> Oct 24 10:29:55 __sd_notify_done(733) done 0 10935673
> Oct 24 10:30:02 ob_open(449) failed to open /mnt/sheep-0026-02/obj/00000003/20a6dd79886e13c4, No such file or directory

Hi Chris,

I've confirmed some bugs which were present after recent cluster
driver patches.  I'll fix them soon.

Thanks,

Kazutaka

> 
> Best wishes,
> 
> Chris.
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list