On 05/11/2012 06:40 AM, Shevek wrote: > There are a number of races around join/leave. > > This patch fixes a case described by Huxinwei and Liu Yuan where > sheepdog fails to elect a master. A longer description is in the patch. > > A problem arises if a node joins the cluster and generates a > confchg event, then crashes or leaves without sending a join > request and receiving a join response. The second node to join > never becomes master, and the entire cluster hangs. > > This patch allows a node to detect whether it should promote itself > to master after an arbitrary confchg event. Every node except the > master creates a blocked JOIN event for every node that joined > after itself, therefore the master is the node which has a JOIN > event for every node in the members list. > > A following patch will handle the case where a join request > is sent, but the master crashes before sending a join response. > > Changes to this patch have been made as requested by Liu Yuan. Hi Shevek, I have run script/checkpatch.pl against your patch, and get 3 warnings, please fix it. Thanks, Yuan |