[sheepdog] [PATCH] recovery: fix incomplete recovery because of faulty oid scheduling

Mon Apr 29 09:11:35 CEST 2013

At Mon, 29 Apr 2013 15:59:03 +0900,
MORITA Kazutaka wrote:
> 
> At Mon, 29 Apr 2013 13:15:55 +0800,
> Liu Yuan wrote:
> > 
> > On 04/29/2013 07:15 AM, MORITA Kazutaka wrote:
> > > When auto-recovery is disabled, sheep recovers the objects only in
> > > prio_oids and shouldn't reach here if the oid is in recovery, no?
> > 
> > No, this is why I saw bugs. I saw it happens only when two read on the same oid in the recovery process. The first one will trigger the oid scheduling(prepare/finish) and then the second one will also trigger it(prepare/finish) and result in faulty bug. The oids in rw->oids will be '7c2b2500000003,7c2b2500000003,7c2b2500000006', 7c2b2500000007 was ejected out.
> > 
> > Following is the error log when bug is reproduced by tests/010.
> > 
> > Apr 29 13:01:23 [main] queue_request(353) READ_OBJ, 1
> > Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [main] finish_schedule_oids(405) nr_recovered 0, nr_prio_oids 1, count 3 = new 3
> > Apr 29 13:01:23 [main] prepare_schedule_oid(252) 7c2b2500000003 nr_prio_oids 0
> > Apr 29 13:01:23 [main] request_in_recovery(200) 7c2b2500000003 wait on oid
> > Apr 29 13:01:23 [rw 4870] recover_object_work(205) done:0 count:3, oid:7c2b2500000003
> > Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [rw 4870] do_recover_object(147) try recover object 7c2b2500000003 from epoch 8
> > Apr 29 13:01:23 [rw 4870] sockfd_cache_get(387) 127.0.0.1:7007, idx 0
> > Apr 29 13:01:23 [main] client_handler(808) 1, rx 0, tx 3
> > Apr 29 13:01:23 [main] finish_rx(612) 27, 127.0.0.1:56182
> > Apr 29 13:01:23 [main] queue_request(353) READ_PEER, 1
> > Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [main] prepare_schedule_oid(252) 7c2b2500000003 nr_prio_oids 1
> > Apr 29 13:01:23 [io 4869] do_process_work(1359) a4, 7c2b2500000003, 8
> > Apr 29 13:01:23 [io 4869] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [io 4869] err_to_sderr(65) /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [io 4869] err_to_sderr(72) object 007c2b2500000003 not found locally
> > Apr 29 13:01:23 [io 4869] do_process_work(1366) failed: a4, 7c2b2500000003 , 8, 2
> > Apr 29 13:01:23 [main] io_op_done(67) unhandled error 2
> > Apr 29 13:01:23 [main] client_handler(808) 4, rx 0, tx 3
> > Apr 29 13:01:23 [main] finish_tx(699) connection from: 27, 127.0.0.1:56182
> > Apr 29 13:01:23 [rw 4870] sheep_exec_req(526) failed 2
> > Apr 29 13:01:23 [rw 4870] sockfd_cache_put(422) 127.0.0.1:7007 idx 0
> > Apr 29 13:01:23 [rw 4870] sockfd_cache_get(387) 127.0.0.1:7002, idx 0
> > Apr 29 13:01:23 [rw 4870] sockfd_cache_put(422) 127.0.0.1:7002 idx 0
> > Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [rw 4870] default_create_and_write(343) 7c2b2500000003
> > Apr 29 13:01:23 [rw 4870] recover_object_from_replica(111) recovered oid 7c2b2500000003 from 8 to epoch 8
> > Apr 29 13:01:23 [main] wakeup_requests_on_oid(255) retry 7c2b2500000003
> > Apr 29 13:01:23 [main] queue_request(353) READ_OBJ, 1
> > Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> > Apr 29 13:01:23 [main] oid_in_recovery(264) the object 7c2b2500000003 is already recoverd
> > Apr 29 13:01:23 [main] finish_schedule_oids(405) WARN: nr_recovered 1, nr_prio_oids 1, count 3 = new 4
> > 
> 
> Thanks, I understood what's going on from your log.
> 
> The problem is that scheduled oids can be re-scheduled again.  I think
> the following is a better fix because it also omits redundant
> scheduling even when auto-recovery is enabled.
> 
> ---- >8 ---- >8 ---- >8 ----
> diff --git a/sheep/recovery.c b/sheep/recovery.c
> index 23babe0..e9cfc02 100644
> --- a/sheep/recovery.c
> +++ b/sheep/recovery.c
> @@ -238,11 +238,15 @@ static inline void prepare_schedule_oid(uint64_t oid)
>  			return;
>  		}
>  	/*
> -	 * When auto recovery is enabled, the oid is currently being
> -	 * recovered
> +	 * rw->oids[rw->done..rw->nr_scheduled_prio_oids - 1] are
> +	 * already scheduled ones.
>  	 */
> -	if (!sys->disable_recovery && rw->oids[rw->done] == oid)
> -		return;
> +	for (i = rw->done; i < rw->nr_scheduled_prio_oids; i++)
> +		if (rw->oids[i] == oid) {
> +			sd_dprintf("oid %" PRIx64 " is already scheduled", oid);
> +			return;
> +		}
> +
>  	rw->nr_prio_oids++;
>  	rw->prio_oids = xrealloc(rw->prio_oids,
>  				 rw->nr_prio_oids * sizeof(uint64_t));

Please drop my patch.  Seems that we need more fixes for this problem.
I'll prepare another ones.

Thanks,

Kazutaka