[sheepdog] [PATCH v3] corosync: fix cluster hang by cluster requests blocking confchg

Yunkai Zhang yunkai.me at gmail.com
Thu Jul 5 17:08:24 CEST 2012


On Thu, Jul 5, 2012 at 8:35 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 07/05/2012 08:33 PM, Liu Yuan wrote:
>> On 07/05/2012 07:57 PM, MORITA Kazutaka wrote:
>>> Corosync delivers confchg events even if there remain the previous
>>> events in corosync_event_list, doesn't it?  Adding events to the head
>>> of the queue breaks the guarantee that all the nodes must receive
>>> confchg events in the same order, I think.
>>
>> If corosync guarantee the delivery order of confchg, so we can reorder
>> it at our will and the final order seen by sheep core will be the same I
>> think, doesn't it?
>>
>
> In the assumption that leave event is delivered and processed one by
> one. I think current implementation does it one by one for leave event.

Yes, leave event is delivered by corosync to sheep one by one.

But the order processed by sheep depends on when sheep read it from
corosync_event_list when you add leave event to the head of the list.
The time processing confchg event may different between each sheeps,
so the order maybe broken.

If we need to give priority to process leave event and keep same
processing order in each sheep, we can add each leave event in front
of all other events but keep leave event in its delivered order in
corosync_event_list.


>
> Thanks,
> Yuan
>
>
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list