[sheepdog] [PATCH 1/2] zookeeper: handling lost block/notify events during session timeout

Kai Zhang kyle at zelin.io
Tue Jul 2 08:31:56 CEST 2013


If zookeeper session has timeout, zk_block() and zk_notify() will just
do nothing, and the block/notify event will be missed.
However, these events have been added to pending_block_list and
pending_notify_list. If it comes a new cluster operation, this will lead
to an undefined operation.

This patch fixed the problem by recalling unhandled block/notify events
when re-established connection to zookeeper.

Signed-off-by: Kai Zhang <kyle at zelin.io>
---
 sheep/group.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/sheep/group.c b/sheep/group.c
index 2d4a25c..de315c3 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -1101,13 +1101,32 @@ static int send_join_request(struct sd_node *ent)
 
 int sd_reconnect_handler(void)
 {
+	struct request *req;
+
 	sys->status = SD_STATUS_WAIT_FOR_JOIN;
 	sys->join_finished = false;
+
 	if (sys->cdrv->init(sys->cdrv_option) != 0)
 		return -1;
 	if (send_join_request(&sys->this_node) != 0)
 		return -1;
 
+	list_for_each_entry(req, main_thread_get(pending_notify_list),
+			    pending_list) {
+		struct vdi_op_message *msg;
+		size_t size;
+		msg = prepare_cluster_msg(req, &size);
+		msg->rsp.result = SD_RES_SUCCESS;
+		sys->cdrv->notify(msg, size);
+		free(msg);
+	}
+
+	list_for_each_entry(req, main_thread_get(pending_block_list),
+			    pending_list) {
+		sys->cdrv->block();
+	}
+	cluster_op_running = false;
+
 	return 0;
 }
 
-- 
1.7.9.5




More information about the sheepdog mailing list