From: levin li <xingke.lwp at taobao.com> Take consider of this scene: Node A and B are in recovery A is recovering object x from B, and object x hasn't been recovered by B. B is recovering object y from A, and object y hasn't been recovered by A. Then B will response A with result SD_RES_NEW_NODE_VER, and A will also response B with result SD_RES_NEW_NODE_VER, then A and B will continually retry to recover object x and y, but always get an response SD_RES_NEW_NODE_VER, neither success, so here's a dead lock which stops the recovery from completing. Signed-off-by: levin li <xingke.lwp at taobao.com> --- sheep/sdnet.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sheep/sdnet.c b/sheep/sdnet.c index 3518e4b..da946af 100644 --- a/sheep/sdnet.c +++ b/sheep/sdnet.c @@ -224,7 +224,8 @@ static int check_request(struct request *req) if (!req->local_oid) return 0; - if (is_recoverying_oid(req->local_oid)) { + if (is_recoverying_oid(req->local_oid) && + !(req->rq.flags & SD_FLAG_CMD_RECOVERY)) { if (req->rq.flags & SD_FLAG_CMD_IO_LOCAL) { /* Sheep peer request */ if (is_recovery_init()) { -- 1.7.10 |