[Sheepdog] PATCH S003: Handle master crashing before sending JOIN request
Liu Yuan
namei.unix at gmail.com
Fri Apr 27 06:54:40 CEST 2012
Hi, Shevek
On 04/27/2012 08:15 AM, Shevek wrote:
>
> A problem arises if a node joins the cluster and generates a
> confchg event, then crashes or leaves without sending a join
> request and receiving a join response. The second node to join
> never becomes master, and the entire cluster hangs.
>
> This patch allows a node to detect whether it should promote itself
> to master after an arbitrary confchg event. Every node except the
> master creates a blocked JOIN event for every node that joined
> after itself, therefore the master is the node which has a JOIN
> event for every node in the members list.
>
> A following patch will handle the case where a join request
> is sent, but the master crashes before sending a join response.
>
Thanks for your patch
I think the (commit: c4e3559758b2e) dedicated to this problem.
mastership is actually transferred to the second sheep. So I suspect
that hang is caused by other bug.
Is there way to confirm or reproduce this hang reliably?
> There is a third outstanding issue if two clusters merge, also to be
> addressed in a following patch.
What kind of issue?
+ // Exactly one non-master member has seen join events for all other
+ // members, because events are ordered.
+ for (i = 0; i < member_list_entries; i++) {
+ struct cpg_node member = {
please use the /* */ to comment multiple lines.
Thanks,
Yuan
More information about the sheepdog
mailing list