[sheepdog] [PATCH v3] sheep: remove master node

Kai Zhang kyle at zelin.io
Tue Jul 23 10:30:33 CEST 2013


On Jul 23, 2013, at 4:00 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:

> At Sat, 20 Jul 2013 15:21:55 +0800,
> Kai Zhang wrote:
>> 
>> On Jul 19, 2013, at 12:01 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
>> 
>>> This patch changes the sheep join procedure to the following.
>>> 
>>> 1. The joining node sends a join request.
>>> 2. Some of the existing nodes accept the request.
>>> 3. All the nodes update cluster members.
>>> 
>>> It is allowed for the multiple nodes to call sd_join_handler() against
>>> the same join request, but at least one node must have to do it.  With
>>> this change, we can eliminate a master, and node failure while
>>> accepting node join is also allowed.
>>> 
>>> Removing a master from zookeeper is not easy since it doesn't expect
>>> that multiple nodes send EVENT_ACCEPT.  I'll leave this for another
>>> day.
>> 
>> 
>> Here are 2 questions in my mind:
>> 
>> 1. Based on current implementation of cluster driver, we accept all join request
>> when the cluster is running.
>> However, consider the fowling scenario:
>> - cluster runs with A, B, C, D
>> - A quits for some reasons
>> - after A quits, lots of data operations happened
>> - B, C, D all quit for some reasons
>> - A comes back with old data
>> - B, C, D come back
>> In this scenario, old data will overwrite new data, no?
> 
> Yes, but it happens without my series.  The current implementation
> allows the client to see the old data, and that's the reason I added a
> sd_printf with high priority like
> 
>  sd_printf(SDOG_ALERT, "clients may see old data");
> 
> Previously, I tried to add a code to stop removing stale objects when
> the above situation happens, and add a chance to recover correct
> objects manually.
> 
>  http://lists.wpkg.org/pipermail/sheepdog/2013-May/009869.html
> 
> However, we agreed that stopping a sheepdog cluster and doing manual
> recovery is not acceptable approach for service providers, and are
> still finding a better way to recover successfully from the above
> problem.
> 

I think in old implementation, when B, C, D come back, A will kill it self
due to its epoch is less than other's. This would save the data a little bit.
However, if a client connected with A before B, C, D came back, the client
would receive old data.


>> 
>> 2. All sheep who join an empty cluster at the same time will always successful.
> 
> Even if sheep joins to the empty cluster at the same time, the cluster
> drivers orders the join events.

Yes, but zookeeper cannot handle it right now.
However, I think there is a way to remove it.

I would like to try after this patch is merged.


> 
>> Is this safe?
>> 
>> 
>> By the way, in term of zookeeper, I think it can work well when there are multiple EVENT_ACCEPT
>> events for one join request.
>> This is because an update event will only trigger zookeeper driver to fetch a event from the queue.
>> If the event is "accept", then it will handle it by calling sd_accept_handler and move to next node.
> 
> How can we gurantee that more than one sheep fetch the accept event at
> the same time?

Well, the "fetch" will not remove the event from the queue. 
The queue is shared by all sheep and no one can remove any item in it except the administrator.
Each sheep uses an integer to record its queue's head position. 
One sheep fetch the event will not conflict with others.
And finally, all sheep should fetch the event and handle it.
So this should not be a problem.

> 
>> The fowling update event will trigger it to find if there is a new event, but not to handle the EVENT_ACCEPT
>> again.
>> So the sd_accept_handler() will be called only once for one join request in one sheep.
>> If so, we can remove "master" from zookeeper driver totally.
> 
> Even if it is true, we cannot remove the zk master because my patch
> cannot handle concurrent join of sheeps with the zk driver.
> 

I would like to remove master from zookeeper after this patch is merged.
However, I think there is no relationship between "master" and "ACCEPT" event.

Thanks,
Kyle


More information about the sheepdog mailing list