[sheepdog] [PATCH v3] sheep: remove master node

Wed Jul 24 11:40:10 CEST 2013

At Wed, 24 Jul 2013 17:20:51 +0800,
Kai Zhang wrote:
> 
> 
> On Jul 24, 2013, at 5:13 PM, Kai Zhang <kyle at zelin.io> wrote:
> 
> > 
> > On Jul 24, 2013, at 2:53 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:
> > 
> >> At Tue, 23 Jul 2013 17:30:03 +0800,
> >> Kai Zhang wrote:
> >>> 
> >>> On Jul 23, 2013, at 4:44 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:
> >>> 
> >>>> Ah, sorry.  The node A doesn't start until the nodes B, C, and D come
> >>>> back.  It is because the latest epoch in the node A includes B, C, and
> >>>> D. senerio
> >>> 
> >>> Well, it seems I didn't fully understand the current implementation of cluster driver.
> >>> 
> >>> A very silly question: if B, C come back but D does not, what is the status of 
> >>> the cluster? It can work or just wait for D?
> >> 
> >> The cluster status will be SD_STATUS_WAIT.  It will wait for the node
> >> D to join Sheepdog if you don't run "collie cluster recover force".
> >> 
> > 
> > Does this mean that sheepdog is not self-healing?
> > Any persistent failure of sheep will be handled by administrator?
> 
> Sorry, my description is not correct.
> What I mean is that sheepdog cluster cannot recover by themselves at this scenario.
> And I'm a little disappointed with this.
> Is there a possibility to solve this?

If the number of redundacy is 1, it is possible that only the node D
has the latest data.  Then, it's not safe to start sheepdog
automatically without the node D.

Sheepdog starts if all the nodes in the previous epoch are gathered -
this is necessary to keep strong consistency which is required for
block storage system.  We can relax this rule a bit (e.g. it is okay
to start sheepdog in the above example if the number of redundancy is
larger than one).  It's on my TODO items.

Thanks,

Kazutaka