On 10/24/2011 02:11 PM, MORITA Kazutaka wrote: > At Sat, 22 Oct 2011 13:32:42 +0800, > Liu Yuan wrote: >> >> From: Liu Yuan <tailai.ly at taobao.com> >> >> Currently, the sheepdog cluster cannot get recovered for below conditions >> >> 1) the master node is physically down after the cluster crashes with >> different epoches during recovery. >> 2) some of nodes are physically down after the cluster is shutdowned >> during recovery. >> >> This patch add a manual recovery mechanism. With this patch, you can manually >> recover the cluster at any live node by: >> >> $ collie cluster recover >> >> [Use with Caution] >> >> This command will increment cluster epoch by 1! >> >> for 1) case, you need to try to start up the nodes in sequence for the first >> round until the master node is up, thanks to the mastership mechanism. If >> unfortunately not, you can simply run the recover command. After that, you can >> freely join other good nodes in. >> >> for 2) case, you'd better try to start up all the nodes to see if any of nodes get >> physically down. If any, unfortunately, you can simply run the recover command. > > How about prompting a warning message before doing cluster recovery? > I guess newbies could run 'collie cluster recovery' wrongly without > finding the previous master node. > > Thanks, > > Kazutaka Okay, Thanks for your review. I'll prepare V2 patch. Thanks, Yuan |