[sheepdog] [PATCH] zookeeper: fix cluster hang by giving priority to process LEAVE event

Thu Jul 19 05:10:10 CEST 2012

On Thu, Jul 19, 2012 at 10:45 AM, Liu Yuan <namei.unix at gmail.com> wrote:
> On 07/16/2012 01:07 PM, Yunkai Zhang wrote:
>> From: Yunkai Zhang <qiushu.zyk at taobao.com>
>>
>> V2:
>> - fix zk_queue_pop() when it's called by zk_unblock():
>>   continue to process block event when is_zk_unblock is True
>> -------------------------------------------------------- >8
>>
>> As cluster request may retry infinitely when some sheeps left, than
>> cluster_op_done could not to be called forever, so it will cause cluster
>> hang problem.
>>
>> By giving priority to process LEAVE event when there is unfinished BLOCK
>> event, we can fix this issue, but also comply with the rule which is very
>> important for distributed system I think:
>>
>> All sheeps should process all events in the same order.
>>
>
> This actually relax the order though not as aggressive as the patch in
> corosync does.

That is the point, they are different!

Zookeeper driver *just* giving priority to process LEAVE event, only
when there is unfinished BLOCK event. By this difference, all sheep
will process each message in the same order, but this rule will be
broken in corosync driver.

>
>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
>> ---
>>  sheep/cluster/zookeeper.c |   17 +++++++++++------
>>  1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
>> index 7bd20bd..e03fd22 100644
>> --- a/sheep/cluster/zookeeper.c
>> +++ b/sheep/cluster/zookeeper.c
>> @@ -71,6 +71,7 @@ static struct zk_event zk_levents[SD_MAX_NODES];
>>  static int nr_zk_levents;
>>  static unsigned zk_levent_head;
>>  static unsigned zk_levent_tail;
>> +static bool is_zk_unblock;
>>
>>  static void *zk_node_btroot;
>>  static struct zk_node *zk_master;
>> @@ -239,9 +240,11 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>>       struct zk_event *lev;
>>       eventfd_t value = 1;
>>
>> -     /* process leave event */
>> -     if (uatomic_read(&zk_notify_blocked) <= 0 &&
>> -          uatomic_read(&nr_zk_levents)) {
>> +     /*
>> +      * Continue to process LEAVE event even if
>> +      * we have an unfinished BLOCK event.
>> +      */
>> +     if (!is_zk_unblock && uatomic_read(&nr_zk_levents)) {
>>               nr_levents = uatomic_sub_return(&nr_zk_levents, 1) + 1;
>>               dprintf("nr_zk_levents:%d, head:%u\n", nr_levents, zk_levent_head);
>>
>> @@ -282,6 +285,9 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>>               return 0;
>>       }
>>
>> +     if (!is_zk_unblock && uatomic_read(&zk_notify_blocked) > 0)
>> +             return -1;
>> +
>
> both is_zk_unblock and zk_notify_blocked are blocking related, this need
> at least a documentation what each variables means, if you can't manage

They are two things, I can't fold them.

1) I'm not against to rename is_zk_unblock, when is_zk_unblock is
true, it means zk_queue_pop() was called by zk_unblock(). Maybe we can
rename it to called_by_zk_unblock.

2) zk_notify_blocked, means whether there is unfunished BLOCK event.

> to fold these two into one. Also, for the variable itself, better to
> rename is_zk_unblock as zk_blocked or similar. Having one var
> zk_notify_blocked and the other is_zk_unblock will cause a lot of confusion.
>
> Thanks,
> Yuan
>

-- 
Yunkai Zhang
Work at Taobao