[sheepdog] [PATCH] sheep: fix handling of too old epoch in check_request

Thu May 31 10:22:20 CEST 2012

On Thu, May 31, 2012 at 11:00:09AM +0800, levin li wrote:
> But I'm not quite understand your issue (c), for a gateway request that without SD_FLAG_CMD_IO_LOCAL,
> it indeed never receive a SD_RES_NEW_NODE_VER, because peer node will wait until it's epoch get
> equal to gateway's epoch, and then it can response.

Sorry if my description of the issue was to short, what I mean is this
code in io_op_done:

		if (!(req->rq.flags & SD_FLAG_CMD_IO_LOCAL)) {
			if (req->rp.epoch > sys->epoch &&
			    req->rp.result == SD_RES_OLD_NODE_VER) {
				list_add_tail(&req->request_list,
						&sys->wait_rw_queue);
			} else
				goto retry;

First the problem is that is adds the request to wait_rw_queue but still
completes it, second I do not quite understand what the test is for.

SD_RES_OLD_NODE_VER means a node rejected the request because sys->epoch
was newer than req->rq.epoch.  The code above tests for a req->rp.epoch
larger than sys_epoch after that happenes, which sounds odd.

I think the gateway should always retry in case of a SD_RES_OLD_NODE_VER
return from the I/O node, assuming the epoch update has propagated to it
in the mean time.