On Sun, Jul 14, 2013 at 12:08:46AM +0900, MORITA Kazutaka wrote: > From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> > > The current procedure to handle sheep join is as follows. > > 1. The joining node sends a join request. > 2. The master node accepts the request. > 3. All the nodes update cluster members. > > This procedure has some problems: > > - The master election is too complex to maintain. > It is very difficult to make sure that the implementation is > correct. > > - The master node can fail while it is accepting the joining node. > The newly elected master has to take over the process, but it's > usually difficult to implement because we have to know what the > previous master did and what it did not before its failure. > > This patch changes the sheep join procedure to the following. > > 1. The joining node sends a join request. > 2. Some of the existing nodes accept the request. Seems that all the nodes in the cluster accept the request, no? > 3. All the nodes update cluster members. > > It is allowed for the multiple nodes to call sd_accept_handler() > against the same join request, but at least one node must have to do > it. With this change, we can eliminate a master, and node failure > while accepting node join is also allowed. > Why sd_accept_handler is reentrant in cluster aspect? I noticed that, e.g, push_join_response() of zk driver are called on all the nodes too. So if following case happens, can sheep handle it? 2 nodes in the cluster {A, B}. And C is joining the cluster. A -> push_join_response() and quickly return, watcher of A, B, C is called to handle EVENT_ACCEPT from A. B -> push_join_response() slowly return because of network, A, B, C handles EVENT_ACCEPT from B. Simply put, can sheep hanle multiple EVENT_ACCEPT of the same node? Thanks Yuan |