[Sheepdog] [PATCH 1/3] collie: add manual recover subcommand for cluster

Liu Yuan namei.unix at gmail.com
Mon Oct 24 09:27:55 CEST 2011


On 10/24/2011 02:11 PM, MORITA Kazutaka wrote:

> At Sat, 22 Oct 2011 13:32:42 +0800,
> Liu Yuan wrote:
>>
>> From: Liu Yuan <tailai.ly at taobao.com>
>>
>> Currently, the sheepdog cluster cannot get recovered for below conditions
>>
>> 1) the master node is physically down after the cluster crashes with
>>    different epoches during recovery.
>> 2) some of nodes are physically down after the cluster is shutdowned
>>    during recovery.
>>
>> This patch add a manual recovery mechanism. With this patch, you can manually
>> recover the cluster at any live node by:
>>
>> $ collie cluster recover
>>
>> [Use with Caution]
>>
>> This command will increment cluster epoch by 1!
>>
>> for 1) case, you need to try to start up the nodes in sequence for the first
>> round until the master node is up, thanks to the mastership mechanism. If
>> unfortunately not, you can simply run the recover command. After that, you can
>> freely join other good nodes in.
>>
>> for 2) case, you'd better try to start up all the nodes to see if any of nodes get
>> physically down. If any, unfortunately, you can simply run the recover command.
> 
> How about prompting a warning message before doing cluster recovery?
> I guess newbies could run 'collie cluster recovery' wrongly without
> finding the previous master node.
> 
> Thanks,
> 
> Kazutaka

Okay, Thanks for your review. I'll prepare V2 patch.

Thanks,
Yuan



More information about the sheepdog mailing list