[Sheepdog] [PATCH 4/4] [PATCH 04/10] sheep: update node information and epoch from

Yibin Shen zituan at taobao.com
Tue May 15 05:51:25 CEST 2012


On Tue, May 15, 2012 at 10:53 AM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 05/15/2012 03:31 AM, Christoph Hellwig wrote:
>
>> Update the node and vnodes lists as well as the epoch information on the
>> master node before replying to the slaves so that we can avoid a race
>> window which gives different sheep the same starting epoch.
>>
>> This detailed order order of corosync messages:
>>
>> slave0:cfgchange(join) ->
>>                              master:send-response(0),
>>
>> slave1:cfgchange(join) ->
>>                              master:send-response(1)
>>                            master:recv-response(0) -> inc_epoch
>>                            master:recv-response(1) -> inc_epoch
>>
>> will cause two responses to contain the same epoch, which gives one slave
>> the wrong starting epoch, and thus causes epoch mismatches in the cluster.
>>
>> It can be fairly easily reproduced by starting a number of sheep very
>> quickly on a formatted cluster.  The actual reproduces is part of a bigger
>> software project, but I and my coworkers hope to contribute a simpler
>> reproducer as part of a test suite soon.

We are really interesting with this test suite.can you give us a
simple introduction?

>>
>> Implementing the fix requires passing the authoritative node list from the
>> cluster drivers to sd_check_join_cb, similar to how we do it for other
>> callbacks from the cluster drivers.  The callers in corosync and zookeeper
>> don't actually have this list as they didn't add the joining node yet, so
>> this adds additional complications.  The corosync version of this has been
>> heavily tested, but the zookeeper variant is entirely untested so far.
>
>
> We actually met this the same problem running zookeeper, when we tried
> to remove register/un-register group_fd to help main thread get more
> responsive. The main thread is very unwieldy for now and become a major
> bottleneck for massive nodes cluster. Our effort is trying to get it
> slimmer.
>
> Thanks for all your work on it, I am going to give this patch a review.
>
> Thanks,
> Yuan
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。



More information about the sheepdog mailing list