[sheepdog] [PATCH V2 00/11] INTRODUCE

Dietmar Maurer dietmar at proxmox.com
Tue Aug 21 06:34:05 CEST 2012


> On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
> > Another thing that sprang into mind is that instead of the formal
> > recovery enable/disable we should simply always delay recovery, that
> > is only do recovery after every N seconds if changes happened.
> > Especially in the cases of whole racks going up/down or upgrades that
> > dramatically reduces the number of epochs required, and thus reduces
> > the recovery overhead.
> >
> > I didn't actually have time to look into the implementation
> > implications of this yet, it's just high level thoughs.
> 
> I think negatively to delay recovery all the time. It is useful to delay recovery
> in some time window for maintenance or operational purposes, so I think
> the idea only to delay recovery manually at some controlled window is
> useful, but if we extend this to all the running time, it will bring cluster to a
> less safe state (if not
> dangerous) at any point. (we only upgrade cluster/maintain individual node
> only at some time, not all the time, no?)

I still think that automatic recovery without delay is the wrong approach. At least for
small clusters you simply want to avoid unnecessary traffic. Such recovery can produce
massive traffic on the network (several TB of data), and can make the whole system unusable 
because of that. I want to control when recovery starts.

- Dietmar




More information about the sheepdog mailing list