On 10/31/2011 07:49 PM, MORITA Kazutaka wrote: > At Mon, 31 Oct 2011 19:37:51 +0800, > Liu Yuan wrote: >> >> On 10/31/2011 07:10 PM, MORITA Kazutaka wrote: >> >>> At Mon, 31 Oct 2011 18:15:06 +0800, >>> Liu Yuan wrote: >>>> >>>> On 10/31/2011 06:00 PM, MORITA Kazutaka wrote: >>>> >>>>> At Tue, 25 Oct 2011 15:06:05 +0800, >>>>> Liu Yuan wrote: >>>>>> >>>>>> On 10/25/2011 02:55 PM, zituan at taobao.com wrote: >>>>>> >>>>>>> From: Yibin Shen <zituan at taobao.com> >>>>>>> >>>>>>> In some situation, sheep may disconnected from corosync instantaneously, >>>>>>> at the same time, both sheep and corosync will keep running but >>>>>>> none of them exit, then the disconnected sheep may receive a confchg >>>>>>> message from corosync which notify this sheep has left. >>>>>>> that will lead to a network partition, this patch fix it. >>>>>>> >>>>>>> Signed-off-by: Yibin Shen <zituan at taobao.com> >>>>>>> --- >>>>>>> sheep/group.c | 3 +++ >>>>>>> 1 files changed, 3 insertions(+), 0 deletions(-) >>>>>>> >>>>>>> diff --git a/sheep/group.c b/sheep/group.c >>>>>>> index e22dabc..ab5a9f0 100644 >>>>>>> --- a/sheep/group.c >>>>>>> +++ b/sheep/group.c >>>>>>> @@ -1467,6 +1467,9 @@ static void sd_leave_handler(struct sheepdog_node_list_entry *left, >>>>>>> struct work_leave *w = NULL; >>>>>>> int i, size; >>>>>>> >>>>>>> + if (node_cmp(left, &sys->this_node) == 0) >>>>>>> + panic("BUG: this node can't be on the left list\n"); >>>>>>> + >>>>>> >>>>>> >>>>>> Hmm, the panic output looks confusing. how about "Network Patition Bug: >>>>>> I should have exited.\n"? since the output will be seen by >>>>>> administrators, not only programmer. >>>>> >>>>> Applied after modifying output text, thanks! >>>>> >>>>> Kazutaka >>>> >>>> >>>> Kazutaka, >>>> Maybe we should not panic out when it becomes a single node cluster. >>>> The node will change into HALT state which doesn't any harm to its data. >>> >>> It is much better. Currently, Sheepdog kills a minority cluster in >>> __sd_leave() when network partition occurs because it is the simplest >>> solution to keep data consistency. >>> >>> But this looks a different issue from this patch. Does corosync >>> include local node in the left list when network partition occurs? If >>> so, we should handle it in the corosync cluster driver because it >>> looks a corosync specific issue to me. >>> >> >> I am not sure, but if corosync include local node in the left list, it >> should be a bug in corosync. >> >> let's assume (a,b,c) three nodes. >> I am suspecting that that left message is for n(b,c), but after n(a) >> rejoins, for whatever reason, the message is being broadcasting, and >> n(a) just gets it wrongly. > > IIUC, n(a) should receive the left massage of n(b,c). Yes, but n(a) should not receive the leave_message(a) which is intended for n(b,c). so the correct sequence should be: network partition happens lm(a) -> n(b,c), lm(b,c) -> n(a). then n(a) rejoins. jm(a) -> n(a,b,c) anyway, I am not sure, cause I didn't look at the log. Thanks, Yuan |