[sheepdog] [PATCH] recovery: fix incomplete recovery because of faulty oid scheduling

Mon Apr 29 08:59:03 CEST 2013

At Mon, 29 Apr 2013 13:15:55 +0800,
Liu Yuan wrote:
> 
> On 04/29/2013 07:15 AM, MORITA Kazutaka wrote:
> > When auto-recovery is disabled, sheep recovers the objects only in
> > prio_oids and shouldn't reach here if the oid is in recovery, no?
> 
> No, this is why I saw bugs. I saw it happens only when two read on the same oid in the recovery process. The first one will trigger the oid scheduling(prepare/finish) and then the second one will also trigger it(prepare/finish) and result in faulty bug. The oids in rw->oids will be '7c2b2500000003,7c2b2500000003,7c2b2500000006', 7c2b2500000007 was ejected out.
> 
> Following is the error log when bug is reproduced by tests/010.
> 
> Apr 29 13:01:23 [main] queue_request(353) READ_OBJ, 1
> Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [main] finish_schedule_oids(405) nr_recovered 0, nr_prio_oids 1, count 3 = new 3
> Apr 29 13:01:23 [main] prepare_schedule_oid(252) 7c2b2500000003 nr_prio_oids 0
> Apr 29 13:01:23 [main] request_in_recovery(200) 7c2b2500000003 wait on oid
> Apr 29 13:01:23 [rw 4870] recover_object_work(205) done:0 count:3, oid:7c2b2500000003
> Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [rw 4870] do_recover_object(147) try recover object 7c2b2500000003 from epoch 8
> Apr 29 13:01:23 [rw 4870] sockfd_cache_get(387) 127.0.0.1:7007, idx 0
> Apr 29 13:01:23 [main] client_handler(808) 1, rx 0, tx 3
> Apr 29 13:01:23 [main] finish_rx(612) 27, 127.0.0.1:56182
> Apr 29 13:01:23 [main] queue_request(353) READ_PEER, 1
> Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [main] prepare_schedule_oid(252) 7c2b2500000003 nr_prio_oids 1
> Apr 29 13:01:23 [io 4869] do_process_work(1359) a4, 7c2b2500000003, 8
> Apr 29 13:01:23 [io 4869] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [io 4869] err_to_sderr(65) /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [io 4869] err_to_sderr(72) object 007c2b2500000003 not found locally
> Apr 29 13:01:23 [io 4869] do_process_work(1366) failed: a4, 7c2b2500000003 , 8, 2
> Apr 29 13:01:23 [main] io_op_done(67) unhandled error 2
> Apr 29 13:01:23 [main] client_handler(808) 4, rx 0, tx 3
> Apr 29 13:01:23 [main] finish_tx(699) connection from: 27, 127.0.0.1:56182
> Apr 29 13:01:23 [rw 4870] sheep_exec_req(526) failed 2
> Apr 29 13:01:23 [rw 4870] sockfd_cache_put(422) 127.0.0.1:7007 idx 0
> Apr 29 13:01:23 [rw 4870] sockfd_cache_get(387) 127.0.0.1:7002, idx 0
> Apr 29 13:01:23 [rw 4870] sockfd_cache_put(422) 127.0.0.1:7002 idx 0
> Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [rw 4870] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [rw 4870] default_create_and_write(343) 7c2b2500000003
> Apr 29 13:01:23 [rw 4870] recover_object_from_replica(111) recovered oid 7c2b2500000003 from 8 to epoch 8
> Apr 29 13:01:23 [main] wakeup_requests_on_oid(255) retry 7c2b2500000003
> Apr 29 13:01:23 [main] queue_request(353) READ_OBJ, 1
> Apr 29 13:01:23 [main] get_object_path(351) 0, /tmp/sheepdog/7/obj
> Apr 29 13:01:23 [main] oid_in_recovery(264) the object 7c2b2500000003 is already recoverd
> Apr 29 13:01:23 [main] finish_schedule_oids(405) WARN: nr_recovered 1, nr_prio_oids 1, count 3 = new 4
> 

Thanks, I understood what's going on from your log.

The problem is that scheduled oids can be re-scheduled again.  I think
the following is a better fix because it also omits redundant
scheduling even when auto-recovery is enabled.

---- >8 ---- >8 ---- >8 ----

diff --git a/sheep/recovery.c b/sheep/recovery.c
index 23babe0..e9cfc02 100644
--- a/sheep/recovery.c
+++ b/sheep/recovery.c
@@ -238,11 +238,15 @@ static inline void prepare_schedule_oid(uint64_t oid)
 			return;
 		}
 	/*
-	 * When auto recovery is enabled, the oid is currently being
-	 * recovered
+	 * rw->oids[rw->done..rw->nr_scheduled_prio_oids - 1] are
+	 * already scheduled ones.
 	 */
-	if (!sys->disable_recovery && rw->oids[rw->done] == oid)
-		return;
+	for (i = rw->done; i < rw->nr_scheduled_prio_oids; i++)
+		if (rw->oids[i] == oid) {
+			sd_dprintf("oid %" PRIx64 " is already scheduled", oid);
+			return;
+		}
+
 	rw->nr_prio_oids++;
 	rw->prio_oids = xrealloc(rw->prio_oids,
 				 rw->nr_prio_oids * sizeof(uint64_t));