On 05/24/2012 11:37 AM, levin li wrote: > From: levin li <xingke.lwp at taobao.com> > > Take consider of this scene: > > Node A and B are in recovery > A is recovering object x from B, > and object x hasn't been recovered by B. > B is recovering object y from A, > and object y hasn't been recovered by A. > > Then B will response A with result SD_RES_NEW_NODE_VER, and > A will also response B with result SD_RES_NEW_NODE_VER, then > A and B will continually retry to recover object x and y, but always > get an response SD_RES_NEW_NODE_VER, neither success, so here's a > dead lock which stops the recovery from completing. > > Signed-off-by: levin li <xingke.lwp at taobao.com> > --- > sheep/sdnet.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/sheep/sdnet.c b/sheep/sdnet.c > index 3518e4b..da946af 100644 > --- a/sheep/sdnet.c > +++ b/sheep/sdnet.c > @@ -224,7 +224,8 @@ static int check_request(struct request *req) > if (!req->local_oid) > return 0; > > - if (is_recoverying_oid(req->local_oid)) { > + if (is_recoverying_oid(req->local_oid) && > + !(req->rq.flags & SD_FLAG_CMD_RECOVERY)) { We'd better comment on the why we do so. > if (req->rq.flags & SD_FLAG_CMD_IO_LOCAL) { > /* Sheep peer request */ > if (is_recovery_init()) { |