[sheepdog] [Sheepdog] [PATCH 0/4] fix a race when multiple sheep join a cluster very quickly

Thu May 17 10:45:53 CEST 2012

On Wed, May 16, 2012 at 10:49 PM, Christoph Hellwig <hch at infradead.org> wrote:
> On Wed, May 16, 2012 at 10:44:14PM +0800, Yunkai Zhang wrote:
>> Hi Hellwig, Do you have any comments on my latest unregister patch
>> which have been updated to reflect your previous comments?
>
> I like it, but I ran into an issue where a VDI created on one
> nodes doesn't seem to to be found on another node created later.
>
> The testcase looks something like this:
>
> sheep -p 7000 -D /tmp/sheep/7000
> collie cluster format --copies=1 -p 7000
> collie vdi create 'test-vdi' 300M -p 7000
> dd if=/dev/zero count=300M | collie vdi write test-vdi -p 7000
> collie vdi read test-vdi 0 1M -p 7000 > /dev/null
>
> sheep -p 7001 -D /tmp/sheep/7001
> collie vdi read test-vdi 0 300M -p 7001 > /dev/null
>
> and the second vdi read can't find the VDI.

I have fixed this issue, the root reason is that, I move
update_cluster_info() from __sd_join_done() to sd_join_handler(), but
update_cluster_info() will change sys->status.

Now, I move the updating of sys->status to __sd_join_done().

Another potential dead lock bug in __sd_join() that sheep may fetch
vdi_bitmap from itself was found and fixed.

I'll give V4 soon.

>

-- 
Yunkai Zhang
Work at Taobao