[Sheepdog] [PATCH v2] sheep: tame sheep to recover the
Liu Yuan
namei.unix at gmail.com
Tue Sep 27 03:45:49 CEST 2011
On 09/27/2011 06:09 AM, MORITA Kazutaka wrote:
> At Mon, 26 Sep 2011 11:43:34 -0700 (PDT),
> Ski Mountain wrote:
>> What happens if one of the nodes in the cluster is not recoverable at all. IE fried motherboard, can you just start up the vm's that were on the dead machine on another machine in the cluster?
> If the unrecoverable node doesn't have the latest epoch info, we need
> to do nothing special. If you start the sheep daemon on all other
> machines, then the cluster will work again.
>
> But if the failed node has the latest epoch, this is the case we need
> a manual recovery. It is because there is a risk of data loss in this
> case, though I think this rarely happens.
>
>
Hi Kazutaka,
I do have some idea like 'collie cluster recover' hanging over in
my head. This kind of brutal force manual recovery would be the last
resort to handle physical highest-epoch node failure in crashed cluster
or physical nodes failure in shutdown cluster.
The implementation might be rather easy. I am thinking of adding a
new SD_MSG_RECOVERY event and broadcast this event to recovery the
cluster with the epoch incremented by 1. how do you think of it?
Thanks,
Yuan
More information about the sheepdog
mailing list