[sheepdog] [PATCH] sheep: handle recovery request in check_request_in_recovery()

Christoph Hellwig hch at infradead.org
Sat Jun 2 16:57:35 CEST 2012


On Sat, Jun 02, 2012 at 10:51:05PM +0800, Liu Yuan wrote:
> Thanks, this is only changes this series brought to old recovery logic.
> This speedup looks valid, so I think other places misbehaves and
> uncovered with this patch.

>From looking at the issue a bit it seems the problem is that we don't
properly retry this case on the recovering sheep, from the logs for one
of the objects that didn't get recovered properly:

# grep 7be7f90000000f /tmp/sheep/*/sheep.log
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 recover_object_work(253) done:1 count:4, oid:7be7f90000000f
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 do_recover_object(194) try recover object 7be7f90000000f from epoch 5
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 farm_link(549) try link 7be7f90000000f from snapshot with epoch 5
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 retrieve_object_from_snap(423) oid 7be7f90000000f, epoch 5, fail
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 do_recover_object(194) try recover object 7be7f90000000f from epoch 4
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 do_recover_object(194) try recover object 7be7f90000000f from epoch 3
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 do_recover_object(194) try recover object 7be7f90000000f from epoch 2
/tmp/sheep/7003/sheep.log:Jun 02 07:49:02 do_recover_object(194) try recover object 7be7f90000000f from epoch 1
/tmp/sheep/7003/sheep.log:Jun 02 07:49:03 do_recover_object(222) can not recover oid 7be7f90000000f
/tmp/sheep/7003/sheep.log:Jun 02 07:49:03 recover_object_work(262) failed to recover object 7be7f90000000f

I need to add a printout of the error value in the
recover_object_from_replica failure case, but I suspect we keep getting
SD_RES_OBJ_RECOVERING back from the target sheep, and then move back
to older versions.  I guess do_recover_object simply needs to handle
SD_RES_OBJ_RECOVERING special, e.g. by trying other recovery first but
going back to until we get a different return value.



More information about the sheepdog mailing list