[sheepdog] [PATCH] sheep: fix a bug that hangs the cluster during recovery
levin li
levin108 at gmail.com
Mon Aug 27 12:45:07 CEST 2012
From: levin li <xingke.lwp at taobao.com>
During recovery, a VDI creation request may waits for recovery
to complete, and VDI creation request is a cluster request which
prevent other cluster requests being processed, when recovery comes
to notify_recovery_completion_work, it issues another cluster request
with SD_OP_COMPLETE_RECOVERY which is blocked by VDI creation, and
as result, notify_recovery_completion_work blocks the recovery_wqueue,
if a new recovery comes, it's blocked, at the same time, a VDI creation
request may waits for this recovery to complete, so it's a dead lock.
Signed-off-by: levin li <xingke.lwp at taobao.com>
---
sheep/recovery.c | 2 +-
sheep/sheep.c | 1 +
sheep/sheep_priv.h | 1 +
3 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/sheep/recovery.c b/sheep/recovery.c
index 2232110..59ac9d6 100644
--- a/sheep/recovery.c
+++ b/sheep/recovery.c
@@ -373,7 +373,7 @@ static inline void finish_recovery(struct recovery_work *rw)
/* notify recovery completion to other nodes */
rw->work.fn = notify_recovery_completion_work;
rw->work.done = notify_recovery_completion_main;
- queue_work(sys->recovery_wqueue, &rw->work);
+ queue_work(sys->recovery_notify_wqueue, &rw->work);
dprintf("recovery complete: new epoch %"PRIu32"\n",
sys->recovered_epoch);
diff --git a/sheep/sheep.c b/sheep/sheep.c
index 31af42c..10c0501 100644
--- a/sheep/sheep.c
+++ b/sheep/sheep.c
@@ -370,6 +370,7 @@ int main(int argc, char **argv)
sys->gateway_wqueue = init_work_queue("gateway", false);
sys->io_wqueue = init_work_queue("io", false);
sys->recovery_wqueue = init_work_queue("recovery", true);
+ sys->recovery_notify_wqueue = init_work_queue("recovery notify", true);
sys->deletion_wqueue = init_work_queue("deletion", true);
sys->block_wqueue = init_work_queue("block", true);
sys->sockfd_wqueue = init_work_queue("sockfd", true);
diff --git a/sheep/sheep_priv.h b/sheep/sheep_priv.h
index 1f5a1bd..90006f6 100644
--- a/sheep/sheep_priv.h
+++ b/sheep/sheep_priv.h
@@ -115,6 +115,7 @@ struct cluster_info {
struct work_queue *io_wqueue;
struct work_queue *deletion_wqueue;
struct work_queue *recovery_wqueue;
+ struct work_queue *recovery_notify_wqueue;
struct work_queue *block_wqueue;
struct work_queue *sockfd_wqueue;
struct work_queue *reclaim_wqueue;
--
1.7.1
More information about the sheepdog
mailing list