[sheepdog] [PATCH] recovery: fix incomplete recovery because of faulty oid scheduling
Liu Yuan
namei.unix at gmail.com
Sat Apr 27 12:01:51 CEST 2013
From: Liu Yuan <tailai.ly at taobao.com>
tests/010 will fail sometimes to catch this bug.
When auto-recover is disabled, if we prepare_schedule_oid on the same oid
multiple times, it will break finish_schedule_oid to wrongly squeeze some
victim oid out of rw->oids array, then this node will never have a chance to
recover the ejected oids.
Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
---
sheep/recovery.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/sheep/recovery.c b/sheep/recovery.c
index 23babe0..6b39e0a 100644
--- a/sheep/recovery.c
+++ b/sheep/recovery.c
@@ -237,18 +237,18 @@ static inline void prepare_schedule_oid(uint64_t oid)
oid);
return;
}
- /*
- * When auto recovery is enabled, the oid is currently being
- * recovered
- */
- if (!sys->disable_recovery && rw->oids[rw->done] == oid)
+ /* The oid is currently being recovered */
+ if (rw->oids[rw->done] == oid) {
+ if (rw->suspended == true) {
+ rw->suspended = false;
+ queue_work(sys->recovery_wqueue, &rw->work);
+ }
return;
+ }
rw->nr_prio_oids++;
rw->prio_oids = xrealloc(rw->prio_oids,
rw->nr_prio_oids * sizeof(uint64_t));
rw->prio_oids[rw->nr_prio_oids - 1] = oid;
- resume_suspended_recovery();
-
sd_dprintf("%"PRIx64" nr_prio_oids %d", oid, rw->nr_prio_oids);
}
@@ -291,6 +291,7 @@ bool oid_in_recovery(uint64_t oid)
}
prepare_schedule_oid(oid);
+ resume_suspended_recovery();
return true;
}
--
1.7.9.5
More information about the sheepdog
mailing list