[sheepdog] [PATCH] sheep: fix handling of too old epoch in check_request

Thu May 31 05:00:09 CEST 2012

On 05/30/2012 10:41 PM, Christoph Hellwig wrote:
> On Tue, May 29, 2012 at 05:33:56AM -0400, Christoph Hellwig wrote:
>> If we hit a too old epoch in check_request (which can only happen
>> when resume_pending_requests) we currently try to use io_op_done
>> to decide what to do.  For objects that are store locally on a node
>> that also acts as a gateway this is currently fatal:
>>
>>  (a) io_op_done tries to remove the request from a list it has never
>>      been added to
>>  (b) io_op_done completes the request despite adding it to the
>>      wait_rw_queue list
>>
>> In addition it seems that
>>
>>  (c) io_op_done was checking for a too large epoch, not a too small one
>>
>> If that last one was intentional it should at least be documented in there,
>> but I can't come up with a good explanation for it.
>>
>> Fix these issues by opencoding the action we want in check_request.
> 
> All the issues above are unrelated to calling check_request from
> resume_pending_requests, so is there a reason not to proceed with it?
> 
> Also does anyone disagree with my analysis of issue (c)?
> 

I agree with your issue (a) as I just mentioned, and I also agreed to check_request
in resume_pending_requests as I did in my latest path but using queue_request instead
of check_request.

But I'm not quite understand your issue (c), for a gateway request that without SD_FLAG_CMD_IO_LOCAL,
it indeed never receive a SD_RES_NEW_NODE_VER, because peer node will wait until it's epoch get
equal to gateway's epoch, and then it can response.

thanks,

levin