[sheepdog] read/write during recovery

Thu Jul 26 10:13:42 CEST 2012

On 07/26/2012 04:06 PM, Dietmar Maurer wrote:
> But recovery and cleanup actions can take several hours, so it is quite hard to find a window
> on such system?

We are always optimizing the recovery performance. For now, 30 nodes
with dozens of TB data, the recovery process is less than 30 mins. Note,
recovery can be nested, this means subsequent node event will supersede
the previous one. This means, if you have 2 nodes failed one after
another, the total time is: t0 + t(r), t0 is the window between these
two event, and t(r) is the one node event recovery time.

So yea, theoretically we can't assure mathematically recovery time is
bound into a short window, but when it is reported recovery takes hours,
I think it is time for us to revisit the code and make it faster.

Thanks,
Yuan