[sheepdog] [PATCH] sheep: fix oid scheduling in recovery

Mon Jun 4 14:06:54 CEST 2012

On 06/04/2012 06:00 PM, Christoph Hellwig wrote:

> I think the right fix is to simply give each recover_object_work() call
> it's own work_struct in a structure also containing the oid.  While
> this means a memory allocation per object to be recovered it also means
> complete independence between recovery operations, including kicking off
> onces that have I/O pending ASAP and allowing multiple recoveries in
> parallel.  I'm about to leave for a long haul flight and will try to
> implement this solution while I'm on the plane.

Parallel recovery looks attractive, per-object allocation is pretty big
for current scheme (objects in the list are considered to be recovered),
but actually we only need to recovery those are supposed to be migrated
in (much smaller size), so we only need to allocate memory for the these
objects.

But if we really want a better recovery process, we need to take bigger
picture into consideration:

1: the biggest overhead is actually prepare_object_list(), which tries
to connect to *all* the nodes in the cluster.

2: the number of objects to be recovered from other nodes is quite small
for already joined nodes, but quite big for newly joining node.

Actually, I have since long been thinking of a *reverse* recovery
process: nodes scan its back-end store and actively send copy to
targeted nodes that need to recovery this copy. This would be much more
scalable and efficient.

I hope you can take this into consideration.

Thanks,
Yuan