Hi, Shevek On 04/27/2012 08:15 AM, Shevek wrote: > > A problem arises if a node joins the cluster and generates a > confchg event, then crashes or leaves without sending a join > request and receiving a join response. The second node to join > never becomes master, and the entire cluster hangs. > > This patch allows a node to detect whether it should promote itself > to master after an arbitrary confchg event. Every node except the > master creates a blocked JOIN event for every node that joined > after itself, therefore the master is the node which has a JOIN > event for every node in the members list. > > A following patch will handle the case where a join request > is sent, but the master crashes before sending a join response. > Thanks for your patch I think the (commit: c4e3559758b2e) dedicated to this problem. mastership is actually transferred to the second sheep. So I suspect that hang is caused by other bug. Is there way to confirm or reproduce this hang reliably? > There is a third outstanding issue if two clusters merge, also to be > addressed in a following patch. What kind of issue? + // Exactly one non-master member has seen join events for all other + // members, because events are ordered. + for (i = 0; i < member_list_entries; i++) { + struct cpg_node member = { please use the /* */ to comment multiple lines. Thanks, Yuan |