[sheepdog] [PATCH v4 1/5] zookeeper: fixed concurrent startup error

Kai Zhang kyle at zelin.io
Tue Jun 18 10:25:00 CEST 2013


On Jun 18, 2013, at 4:06 PM, Liu Yuan <namei.unix at gmail.com> wrote:

> On 06/18/2013 02:15 PM, Kai Zhang wrote:
>> Current implementation of zookeeper driver has a risk when multiple sheep
>> start up concurrently.
>> 
>> Consider the following situation:
>> 1. There is a 3 node cluster: sheep1, sheep2, sheep3.
>> 2. Both sheep1 and sheep2 leave cluster.
>> 3. Both sheep1 and sheep2 start up after previous zookeeper session timeout.
>> 4. Sheep3 leaves the cluster before sheep1 and sheep2 receiving join requests
>>   from zookeeper.
>> 5. When sheep1 and sheep2 receive the join requests, both of them assume they
>>   are master due to zk_member_empty() returns true.
> 
> Could you write a test to demonstrate this happen in real life in the
> first place?

Ok. I will write a test to reproduce it.

> 
>> 
>> The new implementation can avoid this problem because sheep will assume itself
>> as master only if it creates master node successfully.
> 
> If you can write how the new impl would work in the commit log, we will
> spend less time on reading the code to get how it works.

Actually, the new implementation is quite simple.
The core function is zk_compete_master().
This function can ensure that:
- the sheep itself is master (so that it can handle itself's join request) or
- a remote master has been elected and joined cluster successfully.
And this function is called when this sheep joins cluster for the first time or it detect a master leave event.
'is_master' is used to indicate whether itself is a master.

I will update the commit log and add some comments to the code.

Thanks,
Kyle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20130618/ccc2e776/attachment-0004.html>


More information about the sheepdog mailing list