[sheepdog] [PATCH v3] corosync: fix cluster hang by cluster requests blocking confchg

Liu Yuan namei.unix at gmail.com
Thu Jul 5 17:13:45 CEST 2012


On 07/05/2012 11:08 PM, Yunkai Zhang wrote:
> Yes, leave event is delivered by corosync to sheep one by one.
> 
> But the order processed by sheep depends on when sheep read it from
> corosync_event_list when you add leave event to the head of the list.
> The time processing confchg event may different between each sheeps,
> so the order maybe broken.
> 
> If we need to give priority to process leave event and keep same
> processing order in each sheep, we can add each leave event in front
> of all other events but keep leave event in its delivered order in
> corosync_event_list.

This is what my v2 does. Seems that even with this method, we can't keep
the order between join and leave events in some corner cases.

I am considering use separate lists for notification and confchg events.

Thanks,
Yuan




More information about the sheepdog mailing list