If the latest epoch is unrecoverable , or is a transient epoch, should it fall back to the last available epoch? Yibin Shen On Tue, Sep 27, 2011 at 11:13 AM, MORITA Kazutaka < morita.kazutaka at lab.ntt.co.jp> wrote: > At Tue, 27 Sep 2011 09:45:49 +0800, > Liu Yuan wrote: > > > > On 09/27/2011 06:09 AM, MORITA Kazutaka wrote: > > > At Mon, 26 Sep 2011 11:43:34 -0700 (PDT), > > > Ski Mountain wrote: > > >> What happens if one of the nodes in the cluster is not recoverable at > all. IE fried motherboard, can you just start up the vm's that were on the > dead machine on another machine in the cluster? > > > If the unrecoverable node doesn't have the latest epoch info, we need > > > to do nothing special. If you start the sheep daemon on all other > > > machines, then the cluster will work again. > > > > > > But if the failed node has the latest epoch, this is the case we need > > > a manual recovery. It is because there is a risk of data loss in this > > > case, though I think this rarely happens. > > > > > > > > > > Hi Kazutaka, > > I do have some idea like 'collie cluster recover' hanging over in > > my head. This kind of brutal force manual recovery would be the last > > resort to handle physical highest-epoch node failure in crashed cluster > > or physical nodes failure in shutdown cluster. > > Good point. > > > > > The implementation might be rather easy. I am thinking of adding a > > new SD_MSG_RECOVERY event and broadcast this event to recovery the > > cluster with the epoch incremented by 1. how do you think of it? > > How about adding a new operation SD_OP_CLUSTER_RECOVERY and > broadcasting it with SD_MSG_VDI_OP? I think It should work like a > "collie cluster format" command. > > > Thanks, > > Kazutaka > -- > sheepdog mailing list > sheepdog at lists.wpkg.org > http://lists.wpkg.org/mailman/listinfo/sheepdog > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110927/76dcc31a/attachment.html> |