[Sheepdog] [PATCH v2] sheep: tame sheep to recover the
kkbaal at gmail.com
Tue Sep 27 05:43:27 CEST 2011
If the latest epoch is unrecoverable , or is a transient epoch,
should it fall back to the last available epoch?
On Tue, Sep 27, 2011 at 11:13 AM, MORITA Kazutaka <
morita.kazutaka at lab.ntt.co.jp> wrote:
> At Tue, 27 Sep 2011 09:45:49 +0800,
> Liu Yuan wrote:
> > On 09/27/2011 06:09 AM, MORITA Kazutaka wrote:
> > > At Mon, 26 Sep 2011 11:43:34 -0700 (PDT),
> > > Ski Mountain wrote:
> > >> What happens if one of the nodes in the cluster is not recoverable at
> all. IE fried motherboard, can you just start up the vm's that were on the
> dead machine on another machine in the cluster?
> > > If the unrecoverable node doesn't have the latest epoch info, we need
> > > to do nothing special. If you start the sheep daemon on all other
> > > machines, then the cluster will work again.
> > >
> > > But if the failed node has the latest epoch, this is the case we need
> > > a manual recovery. It is because there is a risk of data loss in this
> > > case, though I think this rarely happens.
> > >
> > >
> > Hi Kazutaka,
> > I do have some idea like 'collie cluster recover' hanging over in
> > my head. This kind of brutal force manual recovery would be the last
> > resort to handle physical highest-epoch node failure in crashed cluster
> > or physical nodes failure in shutdown cluster.
> Good point.
> > The implementation might be rather easy. I am thinking of adding a
> > new SD_MSG_RECOVERY event and broadcast this event to recovery the
> > cluster with the epoch incremented by 1. how do you think of it?
> How about adding a new operation SD_OP_CLUSTER_RECOVERY and
> broadcasting it with SD_MSG_VDI_OP? I think It should work like a
> "collie cluster format" command.
> sheepdog mailing list
> sheepdog at lists.wpkg.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the sheepdog