[sheepdog] [PATCH] zookeeper: fix cluster hang by giving priority to process LEAVE event
Yunkai Zhang
yunkai.me at gmail.com
Thu Jul 19 05:42:53 CEST 2012
On Thu, Jul 19, 2012 at 11:20 AM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 07/19/2012 11:10 AM, Yunkai Zhang wrote:
>> That is the point, they are different!
>>
>> Zookeeper driver *just* giving priority to process LEAVE event, only
>> when there is unfinished BLOCK event. By this difference, all sheep
>> will process each message in the same order, but this rule will be
>> broken in corosync driver.
>
> They are different in relax degree, but compared with strict ordering,
> it is the same: order is relaxed. You relax the order when the queue is
> blocked.
>
> We don't need this rule in corosync driver, this is not the rule for
> sheepdog: we don't blindly stick to any stereotype unless it is proved
> necessary. Relax ordering is very common in distributed system, for e.g,
> event consistency relax the read/write ordering.
>
> What really matters is, if relaxing still provide correct behavior. As
> far as corosync is concerned, this relaxing is correct and give us
> benefit that: once confchg is handled as highest priority, we will
> reduce the wrong read/write requests with epoch mismatch to a very low
> degree, compared with strict ordering.
What I means is that: corosync can also handle cfgchg events only when
there is an unfinshed BLOCK event(not including other notify events).
But as you known, I haven't found the testing case to prove what I
worry about, so let's shelve this controversy.
>
> Thanks,
> Yuan
--
Yunkai Zhang
Work at Taobao
More information about the sheepdog
mailing list