[sheepdog] [PATCH] zookeeper: fix cluster hang by giving priority to process LEAVE event

Yunkai Zhang yunkai.me at gmail.com
Thu Jul 19 04:32:10 CEST 2012


On Thu, Jul 19, 2012 at 10:31 AM, Yunkai Zhang <yunkai.me at gmail.com> wrote:
> Does anyone review this path? I have tested it for several days in our

s/path/patch/

> testing environment, it have passed more than 120 cases, it works well
> for us.
>
> On Mon, Jul 16, 2012 at 1:07 PM, Yunkai Zhang <yunkai.me at gmail.com> wrote:
>> From: Yunkai Zhang <qiushu.zyk at taobao.com>
>>
>> V2:
>> - fix zk_queue_pop() when it's called by zk_unblock():
>>   continue to process block event when is_zk_unblock is True
>> -------------------------------------------------------- >8
>>
>> As cluster request may retry infinitely when some sheeps left, than
>> cluster_op_done could not to be called forever, so it will cause cluster
>> hang problem.
>>
>> By giving priority to process LEAVE event when there is unfinished BLOCK
>> event, we can fix this issue, but also comply with the rule which is very
>> important for distributed system I think:
>>
>> All sheeps should process all events in the same order.
>>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
>> ---
>>  sheep/cluster/zookeeper.c |   17 +++++++++++------
>>  1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
>> index 7bd20bd..e03fd22 100644
>> --- a/sheep/cluster/zookeeper.c
>> +++ b/sheep/cluster/zookeeper.c
>> @@ -71,6 +71,7 @@ static struct zk_event zk_levents[SD_MAX_NODES];
>>  static int nr_zk_levents;
>>  static unsigned zk_levent_head;
>>  static unsigned zk_levent_tail;
>> +static bool is_zk_unblock;
>>
>>  static void *zk_node_btroot;
>>  static struct zk_node *zk_master;
>> @@ -239,9 +240,11 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>>         struct zk_event *lev;
>>         eventfd_t value = 1;
>>
>> -       /* process leave event */
>> -       if (uatomic_read(&zk_notify_blocked) <= 0 &&
>> -            uatomic_read(&nr_zk_levents)) {
>> +       /*
>> +        * Continue to process LEAVE event even if
>> +        * we have an unfinished BLOCK event.
>> +        */
>> +       if (!is_zk_unblock && uatomic_read(&nr_zk_levents)) {
>>                 nr_levents = uatomic_sub_return(&nr_zk_levents, 1) + 1;
>>                 dprintf("nr_zk_levents:%d, head:%u\n", nr_levents, zk_levent_head);
>>
>> @@ -282,6 +285,9 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>>                 return 0;
>>         }
>>
>> +       if (!is_zk_unblock && uatomic_read(&zk_notify_blocked) > 0)
>> +               return -1;
>> +
>>         if (zk_queue_empty(zh))
>>                 return -1;
>>
>> @@ -618,7 +624,9 @@ static void zk_unblock(void *msg, size_t msg_len)
>>         struct zk_event ev;
>>         eventfd_t value = 1;
>>
>> +       is_zk_unblock = 1;
>>         rc = zk_queue_pop(zhandle, &ev);
>> +       is_zk_unblock = 0;
>>         assert(rc == 0);
>>
>>         ev.type = EVENT_NOTIFY;
>> @@ -656,9 +664,6 @@ static void zk_handler(int listen_fd, int events, void *data)
>>         if (ret < 0)
>>                 return;
>>
>> -       if (uatomic_read(&zk_notify_blocked) > 0)
>> -               return;
>> -
>>         ret = zk_queue_pop(zhandle, &ev);
>>         if (ret < 0)
>>                 goto out;
>> --
>> 1.7.10.4
>>
>
>
>
> --
> Yunkai Zhang
> Work at Taobao



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list