[Sheepdog] [PATCH v2] sheep: tame sheep to recover the

Yibin Shen zituan at taobao.com
Tue Sep 27 05:18:51 CEST 2011


If the latest epoch is unrecoverable , or is a  transient epoch,
should it fall back to the last available epoch?


Yibin Shen

On Tue, Sep 27, 2011 at 11:13 AM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp<mailto:morita.kazutaka at lab.ntt.co.jp>> wrote:
At Tue, 27 Sep 2011 09:45:49 +0800,
Liu Yuan wrote:
>
> On 09/27/2011 06:09 AM, MORITA Kazutaka wrote:
> > At Mon, 26 Sep 2011 11:43:34 -0700 (PDT),
> > Ski Mountain wrote:
> >> What happens if one of the nodes in the cluster is not recoverable at all.  IE fried motherboard, can you just start up the vm's that were on the dead machine on another machine in the cluster?
> > If the unrecoverable node doesn't have the latest epoch info, we need
> > to do nothing special.  If you start the sheep daemon on all other
> > machines, then the cluster will work again.
> >
> > But if the failed node has the latest epoch, this is the case we need
> > a manual recovery.  It is because there is a risk of data loss in this
> > case, though I think this rarely happens.
> >
> >
>
> Hi Kazutaka,
>      I do have some idea like 'collie cluster recover' hanging over in
> my head. This kind of brutal force manual recovery would be the last
> resort to handle physical highest-epoch node failure in crashed cluster
> or physical nodes failure in shutdown cluster.

Good point.

>
>      The implementation might be rather easy. I am thinking of adding a
> new SD_MSG_RECOVERY event and broadcast this event to recovery the
> cluster with the epoch incremented by 1. how do you think of it?

How about adding a new operation SD_OP_CLUSTER_RECOVERY and
broadcasting it with SD_MSG_VDI_OP?  I think It should work like a
"collie cluster format" command.


Thanks,

Kazutaka
--
sheepdog mailing list
sheepdog at lists.wpkg.org<mailto:sheepdog at lists.wpkg.org>
http://lists.wpkg.org/mailman/listinfo/sheepdog


________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110927/9e318037/attachment-0003.html>


More information about the sheepdog mailing list