[Sheepdog] [PATCH RFC 2/2] sheep: teach sheepdog to better recovery the shut-down cluster
Liu Yuan
namei.unix at gmail.com
Wed Sep 21 05:48:37 CEST 2011
On 09/20/2011 04:30 PM, MORITA Kazutaka wrote:
> Looks great, but there seems to be some other cases we need to
> consider. For example:
>
> 1. Start Sheepdog with three daemons.
> $ for i in 0 1 2; do sheep /store/$i -z $i -p 700$i; sleep 1; done
> $ collie cluster format
> $ collie cluster info
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
>
> 2. Then, kill sheep daemons, and start again in the same order.
>
> $ for i in 0 1 2; do pkill -f "sheep /store/$i"; sleep 1; done
> $ for i in 0 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> $ collie cluster info
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-09-20 16:43:10 2 [10.68.14.1:7000]
> 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
>
> The first daemon regards the other two nodes as left nodes, and starts
> working.
>
> 3. Start the other two nodes again.
>
> $ for i in 1 2; do ./sheep/sheep /store/$i -z $i -p 700$i; sleep 1; done
> $ collie cluster info
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> 2011-09-20 16:43:10 2 [10.68.14.1:7000]
> 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> $ collie cluster info -p 7001
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> 2011-09-20 16:43:10 2 [10.68.14.1:7000, 10.68.14.1:7002]
> 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> $ collie cluster info -p 7002
> Cluster status: running
>
> Creation time Epoch Nodes
> 2011-09-20 16:43:10 4 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-20 16:43:10 3 [10.68.14.1:7000, 10.68.14.1:7001]
> 2011-09-20 16:43:10 2 [10.68.14.1:7001, 10.68.14.1:7002]
> 2011-09-20 16:43:10 1 [10.68.14.1:7000, 10.68.14.1:7001, 10.68.14.1:7002]
>
> The epoch informations become inconsistent. It is because the first
> node overwrote the epochs in the other nodes. Similar situations
> could happen if we start from the daemon which doesn't have the latest
> epoch.
>
> We can get away with claiming that this doesn't happen if the
> administrator is careful enough. But is there any good idea to solve
> this problem?
>
I am really puzzled by the semantics of 'collie cluster info'...from the
code, it tries to get the local epoch information, however, by semantics
it suggests this command should get the cluster information. every node
may have its own history, and have chances to have *different* epoch
history with other nodes.
So, I think we should get the same epoch history on any node of cluster.
Kazutaka, how do you think to get the cluster info only from single
node(would be master node in my opionion)?If possible, how do we deal
with the local epoch that it is not master node? if not, we would suffer
epoch inconsistency as you met. we cannot get rid of this inconsistency
in *every* cases.
Thanks,
Yuan
More information about the sheepdog
mailing list