[Sheepdog] [PATCH v2] sheep: tame sheep to recover the
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Tue Sep 27 06:52:51 CEST 2011
At Tue, 27 Sep 2011 11:43:27 +0800,
Yibin Shen wrote:
> If the latest epoch is unrecoverable , or is a transient epoch,
> should it fall back to the last available epoch?
Yes. There is no other way to recover the cluster.
Note that "collie cluster recover" would be a dangerous operation.
For example,
$ sheep /store/0 -p 7000
$ sheep /store/1 -p 7001
$ collie cluster format
$ pkill -f "sheep /store/0"
$ collie vdi create test 4G # vdi will be created only on the secon node
$ collie cluster shutdown
$ sheep /store/0 -p 7000
$ collie cluster recover # start Sheepdog with only the first node
then, Sheepdog starts working, but the vdi "test" will be discarded.
In future, I want a force option for "cluster format" and "cluster
recover".
Thanks,
Kazutaka
>
> Yibin Shen
>
> On Tue, Sep 27, 2011 at 11:13 AM, MORITA Kazutaka <
> morita.kazutaka at lab.ntt.co.jp> wrote:
>
> > At Tue, 27 Sep 2011 09:45:49 +0800,
> > Liu Yuan wrote:
> > >
> > > On 09/27/2011 06:09 AM, MORITA Kazutaka wrote:
> > > > At Mon, 26 Sep 2011 11:43:34 -0700 (PDT),
> > > > Ski Mountain wrote:
> > > >> What happens if one of the nodes in the cluster is not recoverable at
> > all. IE fried motherboard, can you just start up the vm's that were on the
> > dead machine on another machine in the cluster?
> > > > If the unrecoverable node doesn't have the latest epoch info, we need
> > > > to do nothing special. If you start the sheep daemon on all other
> > > > machines, then the cluster will work again.
> > > >
> > > > But if the failed node has the latest epoch, this is the case we need
> > > > a manual recovery. It is because there is a risk of data loss in this
> > > > case, though I think this rarely happens.
> > > >
> > > >
> > >
> > > Hi Kazutaka,
> > > I do have some idea like 'collie cluster recover' hanging over in
> > > my head. This kind of brutal force manual recovery would be the last
> > > resort to handle physical highest-epoch node failure in crashed cluster
> > > or physical nodes failure in shutdown cluster.
> >
> > Good point.
> >
> > >
> > > The implementation might be rather easy. I am thinking of adding a
> > > new SD_MSG_RECOVERY event and broadcast this event to recovery the
> > > cluster with the epoch incremented by 1. how do you think of it?
> >
> > How about adding a new operation SD_OP_CLUSTER_RECOVERY and
> > broadcasting it with SD_MSG_VDI_OP? I think It should work like a
> > "collie cluster format" command.
> >
> >
> > Thanks,
> >
> > Kazutaka
> > --
> > sheepdog mailing list
> > sheepdog at lists.wpkg.org
> > http://lists.wpkg.org/mailman/listinfo/sheepdog
> >
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
More information about the sheepdog
mailing list