[sheepdog] read/write during recovery

Tue Jul 24 08:50:53 CEST 2012

> On 07/24/2012 02:13 PM, Dietmar Maurer wrote:
> >> Why can we simply reject read/writes until we start recovering a
> >> specific object?
> >
> > Sorry, the question is:
> >
> > Why can't we simply reject read/writes until we start recovering a specific
> object?
> >
> 
> What do you by 'reject'? We can't simply return EIO to Guest, that is why we
> have wait queues, which re-queue the requests after some conditions meet.

That was the question - Why can't we reject? We already do:

		if (is_recovery_init()) {
			req->rp.result = SD_RES_OBJ_RECOVERING;

so we already rely on gateway retry?

All we need to do is to log write request, and apply them later after object is
recovered. IMHO, that would be much simpler than current code.

> Basically, there are two mechanism: 1) use wait queues to retry when
> targeted object is being migrated/recovered 2) schedule objects that are
> being requested with higher priority than those aren't.

My suggestion is to use a write journal for write during recovery. So writes
simply succeed and there is no need for queue/schedule code.

> Note, with consistent hashing algorithm, we actually have just very small set
> of objects that are to be migrated/recovered, most of objects don't need to
> be recovered, they just stay where they are. This means most of the requests
> during configuration event will be serviced as normal.

I though that depends on the number of nodes. For example,
if I have 3 nodes and copies=2, about 1/3 of all objects need to be recovered?

- Dietmar