[sheepdog] [PATCH] sheep: fix handling of too old epoch in check_request

Thu May 31 10:03:12 CEST 2012

On Thu, May 31, 2012 at 09:59:55AM +0800, levin li wrote:
> my original plan is that when a request with old epoch comes, we directly
> send back a response with SD_RES_OLD_NODE_VER in check_request() to make the
> sender to retry, argee this?

That's what my patch does if we're on a node that doesn't also act as
gateway.  If we're doing local I/O on a gateway life obviously isn't
that simple.

Talking about that I really hate all the code required to support
gateways doing local I/O.

Does anyone actually use sheepdog clusters small enough that optimizing
this case matters?  I'd love to basically queue up a new SD_FLAG_CMD_IO_LOCAL
I/O in the gateway if if finds a local node and kill all these special
cases that make life hard.

In fact I wonder if we should make the gateway and I/O nodes entirely
separate process, although that would cause quite a bit of trouble for
users initially.

> Note that in check_request(), for a gateway request without SD_FLAG_CMD_IO_LOCAL,
> the epoch of request is always equal to system epoch, for we have set it to
> system epoch in queue_request() before entering check_request().
> 	/*
> 	 * we set epoch for non direct requests here.  Note that we need to
> 	 * sample sys->epoch before passing requests to worker threads as
> 	 * it can change anytime we return to processing membership change
> 	 * events.
> 	 */
> 	if (!(hdr->flags & SD_FLAG_CMD_IO_LOCAL))
> 		hdr->epoch = sys->epoch;

Indeed, I forgot that we also always reset it when requeing.  That'll
make life a lot simpler.