[sheepdog] [PATCH] sheep: fix handling of too old epoch in check_request
levin li
levin108 at gmail.com
Thu May 31 10:40:31 CEST 2012
On 05/31/2012 04:22 PM, Christoph Hellwig wrote:
> On Thu, May 31, 2012 at 11:00:09AM +0800, levin li wrote:
>> But I'm not quite understand your issue (c), for a gateway request that without SD_FLAG_CMD_IO_LOCAL,
>> it indeed never receive a SD_RES_NEW_NODE_VER, because peer node will wait until it's epoch get
>> equal to gateway's epoch, and then it can response.
>
> Sorry if my description of the issue was to short, what I mean is this
> code in io_op_done:
>
> if (!(req->rq.flags & SD_FLAG_CMD_IO_LOCAL)) {
> if (req->rp.epoch > sys->epoch &&
> req->rp.result == SD_RES_OLD_NODE_VER) {
> list_add_tail(&req->request_list,
> &sys->wait_rw_queue);
> } else
> goto retry;
>
> First the problem is that is adds the request to wait_rw_queue but still
> completes it, second I do not quite understand what the test is for.
>
> SD_RES_OLD_NODE_VER means a node rejected the request because sys->epoch
> was newer than req->rq.epoch. The code above tests for a req->rp.epoch
> larger than sys_epoch after that happenes, which sounds odd.
>
> I think the gateway should always retry in case of a SD_RES_OLD_NODE_VER
> return from the I/O node, assuming the epoch update has propagated to it
> in the mean time.
Well, there's a bug, we should make it return after put the request into wait_rw_queue,
thanks for pointing it out.
But I still think you misunderstood my purpose, let me explain:
Gateway(epoch 3) -------------------- req (IO_LOCAL) --------------------> Peer (epoch 8)
(rp.epoch = 8) <------- send back rsp --------- (set epoch, result)
(rp.result = SD_RES_OLD_NODE_VER)
The peer node finds request from gateway with epoch 3 is older than its system epoch 8,
and then peer node set the response epoch with 8 to tell gateway not to retry to request
until gateway's system epoch gets to 8, same time, peer set the result to SD_RES_OLD_NODE_VER.
As for gateway, when it receives the response from peer node with SD_RES_OLD_NODE_VER,
it means the request needs to retry, but when to retry ? gateway will check req->rp.epoch
which is peer's epoch, if gateway's system epoch is still old than peer's epoch, then the
request should not be resend immediately, but put it into wait_rw_queue to wait for a new
epoch, and if epoch is equal or newer than peer's epoch, then resend.
That's what this code does, I hope I made myself clear.
thanks,
levin
More information about the sheepdog
mailing list