[sheepdog] [PATCH] sheep: handle recovery request in check_request_in_recovery()

Liu Yuan namei.unix at gmail.com
Sat Jun 2 17:34:37 CEST 2012


On 06/02/2012 11:30 PM, Liu Yuan wrote:

> On 06/02/2012 11:24 PM, Christoph Hellwig wrote:
> 
>> From looking at a bit more instrumentation the problem is the following:
>>
>>  - the cluster only has nr_copies = 2
>>  - one zone already went down, leaving only one copy of the object
>>  - the sheep that has the copy stays is in RW_INIT state for a while
>>    so we get a SD_RES_OBJ_RECOVERING completion
> 
> 
> int is_recoverying_oid(uint64_t oid)
> {
>         .....
>         if (sd_store->exist(oid)) {
>                 dprintf("the object %" PRIx64 " is already recoverd\n",
> oid);
>                 return 0;
>         }
> 
>         if (rw->state == RW_INIT)
>                 return 1;
>         .......
> }
> 
> So if the object does exist on the targeted node, we
> is_recoverying_oid() will return false and we don't get this
> SD_RES_OBJ_RECOVERING.
> 


For a second thought, we should just revert this patch, because for the
case when the requested object is not in the working directory but in
the snap cache (removed by the targeted node for end_recovery()), we
should really go down!

Thanks,
Yuan



More information about the sheepdog mailing list