[Sheepdog] [PATCH] sheep: bug in event_done leads to dead lock

Yunkai Zhang yunkai.me at gmail.com
Fri Apr 27 15:15:39 CEST 2012


From: Yunkai Zhang <qiushu.zyk at taobao.com>

Dead lock was found in the following scenario:

Suppose that there are two sheeps: S1, S2, and their event_queues
are empty.

Now S1 received a notify message: M1, and call sd_notify_handler()
which will add notify event to its event_queue and than call
process_request_event_queues() to queue_work this event.

At the same time, S2 send a notify message: M2 to cluster and an
I/O request(eg. do_lookup_vdi operation) was submitted to S1 when
S2 calls zk_dispatch() to handle M2.

After S1 received I/O request from S2, it would finally call
process_request_event_queues() to deal with this event, if S1 call
this function before M1's event_done() finished, this I/O request
would not to be processed for the event_queue was not empty. This
problem leads to dead lock between S1 and S2, S2 would be blocked
in read() waitting for the data responsed by S1, and the whole cluster
would be suspended forever.

To fix this problem, we just modify the code in event_done, so that
it can process request_queue after event_queue is empty.

Signed-off-by: Yunkai Zhang <qiushu.zyk at taobao.com>
---
 sheep/group.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/sheep/group.c b/sheep/group.c
index b4cf2da..7e19d33 100644
--- a/sheep/group.c
+++ b/sheep/group.c
@@ -964,8 +964,7 @@ static void event_done(struct work *work)
 	if (ret)
 		panic("failed to register event fd");
 
-	if (!list_empty(&sys->event_queue))
-		process_request_event_queues();
+	process_request_event_queues();
 }
 
 int is_access_to_busy_objects(uint64_t oid)
-- 
1.7.7.6




More information about the sheepdog mailing list