[sheepdog] [PATCH] sheep: remove master node
Liu Yuan
namei.unix at gmail.com
Sun Jul 14 08:25:12 CEST 2013
On Sun, Jul 14, 2013 at 12:08:46AM +0900, MORITA Kazutaka wrote:
> From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
>
> The current procedure to handle sheep join is as follows.
>
> 1. The joining node sends a join request.
> 2. The master node accepts the request.
> 3. All the nodes update cluster members.
>
> This procedure has some problems:
>
> - The master election is too complex to maintain.
> It is very difficult to make sure that the implementation is
> correct.
>
> - The master node can fail while it is accepting the joining node.
> The newly elected master has to take over the process, but it's
> usually difficult to implement because we have to know what the
> previous master did and what it did not before its failure.
>
> This patch changes the sheep join procedure to the following.
>
> 1. The joining node sends a join request.
> 2. Some of the existing nodes accept the request.
Seems that all the nodes in the cluster accept the request, no?
> 3. All the nodes update cluster members.
>
> It is allowed for the multiple nodes to call sd_accept_handler()
> against the same join request, but at least one node must have to do
> it. With this change, we can eliminate a master, and node failure
> while accepting node join is also allowed.
>
Why sd_accept_handler is reentrant in cluster aspect? I noticed that, e.g,
push_join_response() of zk driver are called on all the nodes too. So if
following case happens, can sheep handle it?
2 nodes in the cluster {A, B}. And C is joining the cluster.
A -> push_join_response() and quickly return, watcher of A, B, C is called
to handle EVENT_ACCEPT from A.
B -> push_join_response() slowly return because of network, A, B, C handles
EVENT_ACCEPT from B.
Simply put, can sheep hanle multiple EVENT_ACCEPT of the same node?
Thanks
Yuan
More information about the sheepdog
mailing list