[sheepdog] [PATCH V3] zookeeper: fix cluster hang by giving priority to process LEAVE event
Yunkai Zhang
yunkai.me at gmail.com
Thu Jul 19 05:45:07 CEST 2012
From: Yunkai Zhang <qiushu.zyk at taobao.com>
V3:
- rename is_zk_unblock to called_by_zk_unblock which will be more
descriptive.
V2:
- fix zk_queue_pop() when it's called by zk_unblock():
continue to process block event when is_zk_unblock is True
-------------------------------------------------------- >8
As cluster request may retry infinitely when some sheeps left, than
cluster_op_done could not to be called forever, so it will cause cluster
hang problem.
By giving priority to process LEAVE event when there is unfinished BLOCK
event, we can fix this issue, but also comply with the rule which is very
important for distributed system I think:
All sheeps should process all events in the same order.
Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
---
sheep/cluster/zookeeper.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/sheep/cluster/zookeeper.c b/sheep/cluster/zookeeper.c
index 7bd20bd..d2cb09b 100644
--- a/sheep/cluster/zookeeper.c
+++ b/sheep/cluster/zookeeper.c
@@ -71,6 +71,7 @@ static struct zk_event zk_levents[SD_MAX_NODES];
static int nr_zk_levents;
static unsigned zk_levent_head;
static unsigned zk_levent_tail;
+static bool called_by_zk_unblock;
static void *zk_node_btroot;
static struct zk_node *zk_master;
@@ -239,9 +240,11 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
struct zk_event *lev;
eventfd_t value = 1;
- /* process leave event */
- if (uatomic_read(&zk_notify_blocked) <= 0 &&
- uatomic_read(&nr_zk_levents)) {
+ /*
+ * Continue to process LEAVE event even if
+ * we have an unfinished BLOCK event.
+ */
+ if (!called_by_zk_unblock && uatomic_read(&nr_zk_levents)) {
nr_levents = uatomic_sub_return(&nr_zk_levents, 1) + 1;
dprintf("nr_zk_levents:%d, head:%u\n", nr_levents, zk_levent_head);
@@ -282,6 +285,9 @@ static int zk_queue_pop(zhandle_t *zh, struct zk_event *ev)
return 0;
}
+ if (!called_by_zk_unblock && uatomic_read(&zk_notify_blocked) > 0)
+ return -1;
+
if (zk_queue_empty(zh))
return -1;
@@ -618,7 +624,9 @@ static void zk_unblock(void *msg, size_t msg_len)
struct zk_event ev;
eventfd_t value = 1;
+ called_by_zk_unblock = true;
rc = zk_queue_pop(zhandle, &ev);
+ called_by_zk_unblock = false;
assert(rc == 0);
ev.type = EVENT_NOTIFY;
@@ -656,9 +664,6 @@ static void zk_handler(int listen_fd, int events, void *data)
if (ret < 0)
return;
- if (uatomic_read(&zk_notify_blocked) > 0)
- return;
-
ret = zk_queue_pop(zhandle, &ev);
if (ret < 0)
goto out;
--
1.7.10.4
More information about the sheepdog
mailing list