[sheepdog] [PATCH] sheep: fix handling of too old epoch in check_request

levin li levin108 at gmail.com
Thu May 31 10:40:31 CEST 2012


On 05/31/2012 04:22 PM, Christoph Hellwig wrote:
> On Thu, May 31, 2012 at 11:00:09AM +0800, levin li wrote:
>> But I'm not quite understand your issue (c), for a gateway request that without SD_FLAG_CMD_IO_LOCAL,
>> it indeed never receive a SD_RES_NEW_NODE_VER, because peer node will wait until it's epoch get
>> equal to gateway's epoch, and then it can response.
> 
> Sorry if my description of the issue was to short, what I mean is this
> code in io_op_done:
> 
> 		if (!(req->rq.flags & SD_FLAG_CMD_IO_LOCAL)) {
> 			if (req->rp.epoch > sys->epoch &&
> 			    req->rp.result == SD_RES_OLD_NODE_VER) {
> 				list_add_tail(&req->request_list,
> 						&sys->wait_rw_queue);
> 			} else
> 				goto retry;
> 
> First the problem is that is adds the request to wait_rw_queue but still
> completes it, second I do not quite understand what the test is for.
> 
> SD_RES_OLD_NODE_VER means a node rejected the request because sys->epoch
> was newer than req->rq.epoch.  The code above tests for a req->rp.epoch
> larger than sys_epoch after that happenes, which sounds odd.
> 
> I think the gateway should always retry in case of a SD_RES_OLD_NODE_VER
> return from the I/O node, assuming the epoch update has propagated to it
> in the mean time.

Well, there's a bug, we should make it return after put the request into wait_rw_queue,
thanks for pointing it out. 

But I still think you misunderstood my purpose, let me explain:

Gateway(epoch 3)  -------------------- req (IO_LOCAL) --------------------> Peer (epoch 8)
(rp.epoch = 8)                        <------- send back rsp --------- (set epoch, result)
(rp.result = SD_RES_OLD_NODE_VER)

The peer node finds request from gateway with epoch 3 is older than its system epoch 8,
and then peer node set the response epoch with 8 to tell gateway not to retry to request
until gateway's system epoch gets to 8, same time, peer set the result to SD_RES_OLD_NODE_VER.

As for gateway, when it receives the response from peer node with SD_RES_OLD_NODE_VER,
it means the request needs to retry, but when to retry ? gateway will check req->rp.epoch
which is peer's epoch, if gateway's system epoch is still old than peer's epoch, then the
request should not be resend immediately, but put it into wait_rw_queue to wait for a new
epoch, and if epoch is equal or newer than peer's epoch, then resend.

That's what this code does, I hope I made myself clear.

thanks,

levin



More information about the sheepdog mailing list