[sheepdog] [PATCH] recovery: fix incomplete recovery because of faulty oid scheduling

MORITA Kazutaka morita.kazutaka at gmail.com
Mon Apr 29 01:15:39 CEST 2013


At Sat, 27 Apr 2013 18:01:51 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly at taobao.com>
> 
> tests/010 will fail sometimes to catch this bug.
> 
> When auto-recover is disabled, if we prepare_schedule_oid on the same oid
> multiple times, it will break finish_schedule_oid to wrongly squeeze some
> victim oid out of rw->oids array, then this node will never have a chance to
> recover the ejected oids.
> 
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
>  sheep/recovery.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/sheep/recovery.c b/sheep/recovery.c
> index 23babe0..6b39e0a 100644
> --- a/sheep/recovery.c
> +++ b/sheep/recovery.c
> @@ -237,18 +237,18 @@ static inline void prepare_schedule_oid(uint64_t oid)
>  				   oid);
>  			return;
>  		}
> -	/*
> -	 * When auto recovery is enabled, the oid is currently being
> -	 * recovered
> -	 */
> -	if (!sys->disable_recovery && rw->oids[rw->done] == oid)
> +	/* The oid is currently being recovered */
> +	if (rw->oids[rw->done] == oid) {

When auto-recovery is disabled, sheep recovers the objects only in
prio_oids and shouldn't reach here if the oid is in recovery, no?

> +		if (rw->suspended == true) {

"== true" is unnecessary.

Thanks,

Kazutaka



More information about the sheepdog mailing list