[sheepdog] [PATCH v3] sheep: remove master node

Tue Jul 23 10:00:53 CEST 2013

At Sat, 20 Jul 2013 15:21:55 +0800,
Kai Zhang wrote:
> 
> On Jul 19, 2013, at 12:01 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
> 
> > This patch changes the sheep join procedure to the following.
> > 
> > 1. The joining node sends a join request.
> > 2. Some of the existing nodes accept the request.
> > 3. All the nodes update cluster members.
> > 
> > It is allowed for the multiple nodes to call sd_join_handler() against
> > the same join request, but at least one node must have to do it.  With
> > this change, we can eliminate a master, and node failure while
> > accepting node join is also allowed.
> > 
> > Removing a master from zookeeper is not easy since it doesn't expect
> > that multiple nodes send EVENT_ACCEPT.  I'll leave this for another
> > day.
> 
> 
> Here are 2 questions in my mind:
> 
> 1. Based on current implementation of cluster driver, we accept all join request
> when the cluster is running.
> However, consider the fowling scenario:
> - cluster runs with A, B, C, D
> - A quits for some reasons
> - after A quits, lots of data operations happened
> - B, C, D all quit for some reasons
> - A comes back with old data
> - B, C, D come back
> In this scenario, old data will overwrite new data, no?

Yes, but it happens without my series.  The current implementation
allows the client to see the old data, and that's the reason I added a
sd_printf with high priority like

  sd_printf(SDOG_ALERT, "clients may see old data");

Previously, I tried to add a code to stop removing stale objects when
the above situation happens, and add a chance to recover correct
objects manually.

  http://lists.wpkg.org/pipermail/sheepdog/2013-May/009869.html

However, we agreed that stopping a sheepdog cluster and doing manual
recovery is not acceptable approach for service providers, and are
still finding a better way to recover successfully from the above
problem.

> 
> 2. All sheep who join an empty cluster at the same time will always successful.

Even if sheep joins to the empty cluster at the same time, the cluster
drivers orders the join events.

> Is this safe?
> 
> 
> By the way, in term of zookeeper, I think it can work well when there are multiple EVENT_ACCEPT
> events for one join request.
> This is because an update event will only trigger zookeeper driver to fetch a event from the queue.
> If the event is "accept", then it will handle it by calling sd_accept_handler and move to next node.

How can we gurantee that more than one sheep fetch the accept event at
the same time?

> The fowling update event will trigger it to find if there is a new event, but not to handle the EVENT_ACCEPT
> again.
> So the sd_accept_handler() will be called only once for one join request in one sheep.
> If so, we can remove "master" from zookeeper driver totally.

Even if it is true, we cannot remove the zk master because my patch
cannot handle concurrent join of sheeps with the zk driver.

Thanks,

Kazutaka