[sheepdog] [PATCH] zookeeper: fix cluster hang by giving priority to process LEAVE event
Yunkai Zhang
yunkai.me at gmail.com
Thu Jul 19 04:32:10 CEST 2012
On Thu, Jul 19, 2012 at 10:31 AM, Yunkai Zhang <yunkai.me at gmail.com> wrote:
> Does anyone review this path? I have tested it for several days in our
s/path/patch/
> testing environment, it have passed more than 120 cases, it works well
> for us.
>
> On Mon, Jul 16, 2012 at 1:07 PM, Yunkai Zhang <yunkai.me at gmail.com> wrote:
>> From: Yunkai Zhang <qiushu.zyk at taobao.com>
>>
>> V2:
>> - fix zk_queue_pop() when it's called by zk_unblock():
>> continue to process block event when is_zk_unblock is True
>> -------------------------------------------------------- >8
>>
>> As cluster request may retry infinitely when some sheeps left, than
>> cluster_op_done could not to be called forever, so it will cause cluster
>> hang problem.
>>
>> By giving priority to process LEAVE event when there is unfinished BLOCK
>> event, we can fix this issue, but also comply with the rule which is very
>> important for distributed system I think:
>>
>> All sheeps should process all events in the same order.
>>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
>> ---
>> sheep/cluster/zookeeper.c | 17 +++++++++++------
>> 1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
>> index 7bd20bd..e03fd22 100644
>> --- a/sheep/cluster/zookeeper.c
>> +++ b/sheep/cluster/zookeeper.c
>> @@ -71,6 +71,7 @@ static struct zk_event zk_levents[SD_MAX_NODES];
>> static int nr_zk_levents;
>> static unsigned zk_levent_head;
>> static unsigned zk_levent_tail;
>> +static bool is_zk_unblock;
>>
>> static void *zk_node_btroot;
>> static struct zk_node *zk_master;
>> @@ -239,9 +240,11 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>> struct zk_event *lev;
>> eventfd_t value = 1;
>>
>> - /* process leave event */
>> - if (uatomic_read(&zk_notify_blocked) <= 0 &&
>> - uatomic_read(&nr_zk_levents)) {
>> + /*
>> + * Continue to process LEAVE event even if
>> + * we have an unfinished BLOCK event.
>> + */
>> + if (!is_zk_unblock && uatomic_read(&nr_zk_levents)) {
>> nr_levents = uatomic_sub_return(&nr_zk_levents, 1) + 1;
>> dprintf("nr_zk_levents:%d, head:%u\n", nr_levents, zk_levent_head);
>>
>> @@ -282,6 +285,9 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
>> return 0;
>> }
>>
>> + if (!is_zk_unblock && uatomic_read(&zk_notify_blocked) > 0)
>> + return -1;
>> +
>> if (zk_queue_empty(zh))
>> return -1;
>>
>> @@ -618,7 +624,9 @@ static void zk_unblock(void *msg, size_t msg_len)
>> struct zk_event ev;
>> eventfd_t value = 1;
>>
>> + is_zk_unblock = 1;
>> rc = zk_queue_pop(zhandle, &ev);
>> + is_zk_unblock = 0;
>> assert(rc == 0);
>>
>> ev.type = EVENT_NOTIFY;
>> @@ -656,9 +664,6 @@ static void zk_handler(int listen_fd, int events, void *data)
>> if (ret < 0)
>> return;
>>
>> - if (uatomic_read(&zk_notify_blocked) > 0)
>> - return;
>> -
>> ret = zk_queue_pop(zhandle, &ev);
>> if (ret < 0)
>> goto out;
>> --
>> 1.7.10.4
>>
>
>
>
> --
> Yunkai Zhang
> Work at Taobao
--
Yunkai Zhang
Work at Taobao
More information about the sheepdog
mailing list