[sheepdog] [PATCH] zookeeper: fix cluster hang by giving priority to process LEAVE event

Yunkai Zhang yunkai.me at gmail.com
Thu Jul 19 05:42:53 CEST 2012


On Thu, Jul 19, 2012 at 11:20 AM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 07/19/2012 11:10 AM, Yunkai Zhang wrote:
>> That is the point, they are different!
>>
>> Zookeeper driver *just* giving priority to process LEAVE event, only
>> when there is unfinished BLOCK event. By this difference, all sheep
>> will process each message in the same order, but this rule will be
>> broken in corosync driver.
>
> They are different in relax degree, but compared with strict ordering,
> it is the same: order is relaxed. You relax the order when the queue is
> blocked.
>
> We don't need this rule in corosync driver, this is not the rule for
> sheepdog: we don't blindly stick to any stereotype unless it is proved
> necessary. Relax ordering is very common in distributed system, for e.g,
> event consistency relax the read/write ordering.
>
> What really matters is, if relaxing still provide correct behavior. As
> far as corosync is concerned, this relaxing is correct and give us
> benefit that: once confchg is handled as highest priority, we will
> reduce the wrong read/write requests with epoch mismatch to a very low
> degree, compared with strict ordering.

What I means is that: corosync can also handle cfgchg events only when
there is an unfinshed BLOCK event(not including other notify events).

But as you known, I haven't found the testing case to prove what I
worry about, so let's shelve this controversy.

>
> Thanks,
> Yuan



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list