[sheepdog] [PATCH] sheep: fix a dead-lock bug in screen_object_list()

Mon Nov 26 10:50:33 CET 2012

At Mon, 26 Nov 2012 13:48:50 +0800,
levin li wrote:
> 
> From: levin li <xingke.lwp at taobao.com>
> 
> The bug is introduced in the patch 97ccd87ea15e606b6ec9fecb54f5de453f9c5c1f
> which causes the cluster hangs in recovery, this patch is like a revert patch
> except that I add comment to explain why we should not ignore the objects
> whose nr_objs are zero in screen_object_list().
> 
> screen_object_list() is called in recovery, if a VDI creation is in progress,
> then the VDI creation operation is blocked by gateway to wait for completion of
> recovery, and if we make screen_object_list() to retry infinitely to wait the
> nr_objs to be not zero, which also means it waits for VDI creation to complete,
> so, both VDI creation and recovery can not finish, it's a dead-lock.
> 
> As a fact, we don't need to recovery the objects whose nr_objs are zero, these
> objects can only be VDI objects which has not been created completely because
> of a recovery, after the recovery finishes, gateway would retry to create the
> objects and put it into correct location.
> 
> Signed-off-by: levin li <xingke.lwp at taobao.com>
> ---
>  sheep/recovery.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/sheep/recovery.c b/sheep/recovery.c
> index 9c5c712..bb71cba 100644
> --- a/sheep/recovery.c
> +++ b/sheep/recovery.c
> @@ -539,17 +539,18 @@ static void screen_object_list(struct recovery_work *rw,
>  	int i, j;
>  
>  	for (i = 0; i < nr_oids; i++) {
> -again:
>  		nr_objs = get_obj_copy_number(oids[i], rw->cur_vinfo->nr_zones);
>  		if (!nr_objs) {
>  			dprintf("can not find copy number for object %" PRIx64
>  				"\n", oids[i]);
> -			dprintf("probably, vdi was created but "
> -				"post_cluster_new_vdi() is not called yet\n");
> -			/* FIXME: can we wait for post_cluster_new_vdi
> -			 *        with a better way? */
> -			sleep(1);
> -			goto again;
> +			/*
> +			 * If nr_objs is zero, then the object is a VDI object,
> +			 * and the creation of the VDI is in progress, then we
> +			 * don't need to recover this object, as after recovery,
> +			 * the gateway would retry to complete the creation by
> +			 * putting the objects of the VDI into corrent location.
> +			 */

This is not correct.  If recovery happens after cluster_new_vdi()
succeeds in vdi creation and before post_cluster_new_vdi() sets
vdi.copies, there is no one who will retry vdi creation.  I actually
encountered this problem before and could reproduce it with the
current test framework.

That being said, dead lock should be definitely avoided.  I suggest
adding the problem introduced by this revert as FIXME.

Thanks,

Kazutaka