[sheepdog] [sheepdog/sheepdog] e98977: sheep: avoid diskfull caused by recovery process
mitake.hitoshi at gmail.com
Fri May 27 10:17:13 CEST 2016
On Mon, May 16, 2016 at 10:50 PM, AP <sheepdog at inml.weebeastie.net> wrote:
> On Mon, May 02, 2016 at 06:04:53PM +0900, Hitoshi Mitake wrote:
> > On Sun, May 1, 2016 at 12:14 PM, AP <sheepdog at inml.weebeastie.net>
> > > On Tue, Apr 26, 2016 at 07:20:15PM -0700, Hitoshi Mitake wrote:
> > > > sheep can corrupt its cluster by diskfull with recovery process. For
> > > > avoiding this problem, this patch adds a new option -F to dog cluster
> > > > format. If this command is passed during cluster formatting, every
> > > > sheep process of the cluster skips recovery if there is a possibility
> > > > of diskfull during recovery.
> > >
> > > I'm a little confused and am wondering if I am reading this
> > >
> > > This sounds like the default is to set up the cluster in such a way
> > > it'll corrupt itself.
> > >
> > > Shouldn't it be the other way around? That the default should leave you
> > > safe and you have the option of running naked through the poison ivy
> > > if that's your idea of fun.
> > >
> > > Or did I miss something?
> > The default setup will corrupt the cluster if there is no enough space
> > recovery as you say. However, the new option can result a situation that
> > some objects lack its enough replicas. Maybe adding a new option for
> > killing the cluster itself when there's no enough space would be good.
> Sorry for not replying earlier. Life is a mix of sleep, work and sleep,
> atm. :/
> Unfortunately I don't understand the above. Specifically "can result
> a situation that some objects lack its enough replicas". Lack what
> exactly? :)
> It sounds like the default is to permit overcommit which can result in
> corruption when the space is not there at a critical time. If this is
> the case then this should be a conscious decision made by the admin and
> the default is to go "Your data is precious - have enough space for what
> you want." It'd be the option of least surprise.
> If I'm barking up the wrong tree (possible - I'm not sure what you
> meant in your reply) then my apologies. Would love a clarificaiton,
> time permitting.
> Hopefully I'm not too late in replying.
Sorry for my late reply. As you point, the default can result corrupted
state if there is no space. Turning on the new feature by default would be
reasonable. How do you think?
Anyway, careful capacity planning is required for scalable distributed
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sheepdog