[sheepdog] [PATCH V2 00/11] INTRODUCE

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Aug 21 06:47:30 CEST 2012


At Tue, 21 Aug 2012 04:34:05 +0000,
Dietmar Maurer wrote:
> 
> > On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
> > > Another thing that sprang into mind is that instead of the formal
> > > recovery enable/disable we should simply always delay recovery, that
> > > is only do recovery after every N seconds if changes happened.
> > > Especially in the cases of whole racks going up/down or upgrades that
> > > dramatically reduces the number of epochs required, and thus reduces
> > > the recovery overhead.
> > >
> > > I didn't actually have time to look into the implementation
> > > implications of this yet, it's just high level thoughs.
> > 
> > I think negatively to delay recovery all the time. It is useful to delay recovery
> > in some time window for maintenance or operational purposes, so I think
> > the idea only to delay recovery manually at some controlled window is
> > useful, but if we extend this to all the running time, it will bring cluster to a
> > less safe state (if not
> > dangerous) at any point. (we only upgrade cluster/maintain individual node
> > only at some time, not all the time, no?)
> 
> I still think that automatic recovery without delay is the wrong approach. At least for
> small clusters you simply want to avoid unnecessary traffic. Such recovery can produce
> massive traffic on the network (several TB of data), and can make the whole system unusable 
> because of that. I want to control when recovery starts.

Disabling automatic recovery by default doesn't work for you?  You can
control the time to start recovery with "collie cluster recover enable".

Thanks,

Kazutaka



More information about the sheepdog mailing list