[Sheepdog] [PATCH RFC 2/2] sheep: teach sheepdog to better recovery the shut-down cluster
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Tue Sep 20 12:03:20 CEST 2011
At Tue, 20 Sep 2011 17:33:03 +0800,
Liu Yuan wrote:
>
> On 09/20/2011 04:30 PM, MORITA Kazutaka wrote:
> > Looks great, but there seems to be some other cases we need to
> > consider. For example:
> >
> > 1. Start Sheepdog with three daemons.
> > $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done
> > $ collie cluster format
> > $ collie cluster info
> > Cluster status: running
> >
> > Creation time Epoch Nodes
> > 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> >
> > 2. Then, kill sheep daemons, and start again in the same order.
> >
> > $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
> > $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> > $ collie cluster info
> > Cluster status: running
> >
> > Creation time Epoch Nodes
> > 2011-09-20 16:43:10 2 [10.68.14.1:7000]
> > 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> >
> > The first daemon regards the other two nodes as left nodes, and starts
> > working.
> >
> > 3. Start the other two nodes again.
> >
> > $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> > $ collie cluster info
> > Cluster status: running
> >
> > Creation time Epoch Nodes
> > 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> > 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> > 2011-09-20 16:43:10 2 [10.68.14.1:7000]
> > 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> > $ collie cluster info -p 7001
> > Cluster status: running
> >
> > Creation time Epoch Nodes
> > 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> > 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> > 2011-09-20 16:43:10 2 [10.68.14.1:7000, 10.68.14.1:7002]
> > 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> > $ collie cluster info -p 7002
> > Cluster status: running
> >
> > Creation time Epoch Nodes
> > 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> > 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> > 2011-09-20 16:43:10 2 [10.68.14.1:7001, 10.68.14.1:7002]
> > 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> >
> > The epoch informations become inconsistent. It is because the first
> > node overwrote the epochs in the other nodes. Similar situations
> > could happen if we start from the daemon which doesn't have the latest
> > epoch.
> >
> > We can get away with claiming that this doesn't happen if the
> > administrator is careful enough. But is there any good idea to solve
> > this problem?
> >
>
> Good catch. But actually, this patch set doesn't deal with the epoch
> older or newer problem when is started up.
>
> This patch just resolves the cluster startup problem when they are
> *shutdowned* by 'collie cluster shutdown' command. That is, the epoch
> number is the same, but with corrupted epoch content or different ctime.
>
> I think this case (all nodes are down abnormally instead of being
> shutdowned, for e.g power outage) should be solved by another patch,
> because it is, IMHO, a different problem.
Probably, we should store the information about the node shutdown
status (safely shutdowned or unexpectedly stopped) to take a different
approach when starting up. Though this needs not to be done in this
patch.
>
> When nodes with newer epoch or older epoch should *not* be regarded
> _leave nodes_, they should be processed as soon as they are started up.
> Though, this patch set wrongly take them as leave nodes.
>
> I'll cook a different patch targeting for this problem, well, based on
> this shutdown patch set.
Good! Thanks a lot.
Kazutaka
More information about the sheepdog
mailing list