[sheepdog] [PATCH v3 7/8] recovery: fix a race condition in recovery

Liu Yuan namei.unix at gmail.com
Thu May 24 09:48:59 CEST 2012


On 05/24/2012 11:37 AM, levin li wrote:

> From: levin li <xingke.lwp at taobao.com>
> 
> Take consider of this scene:
> 
> Node A and B are in recovery
> A is recovering object x from B,
> and object x hasn't been recovered by B.
> B is recovering object y from A,
> and object y hasn't been recovered by A.
> 
> Then B will response A with result SD_RES_NEW_NODE_VER, and
> A will also response B with result SD_RES_NEW_NODE_VER, then
> A and B will continually retry to recover object x and y, but always
> get an response SD_RES_NEW_NODE_VER, neither success, so here's a
> dead lock which stops the recovery from completing.
> 
> Signed-off-by: levin li <xingke.lwp at taobao.com>
> ---
>  sheep/sdnet.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sheep/sdnet.c b/sheep/sdnet.c
> index 3518e4b..da946af 100644
> --- a/sheep/sdnet.c
> +++ b/sheep/sdnet.c
> @@ -224,7 +224,8 @@ static int check_request(struct request *req)
>  	if (!req->local_oid)
>  		return 0;
>  
> -	if (is_recoverying_oid(req->local_oid)) {
> +	if (is_recoverying_oid(req->local_oid) &&
> +		!(req->rq.flags & SD_FLAG_CMD_RECOVERY)) {


We'd better comment on the why we do so.

>  		if (req->rq.flags & SD_FLAG_CMD_IO_LOCAL) {
>  			/* Sheep peer request */
>  			if (is_recovery_init()) {



More information about the sheepdog mailing list