[sheepdog] the use case for leave_list/leave_nodes
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Thu Jun 7 04:33:35 CEST 2012
At Wed, 6 Jun 2012 18:38:55 -0400,
Christoph Hellwig wrote:
>
> I'm trying to understand the use case for the leave_list and all code
> associated with it.
>
> From my reading the intention is to allow a cluster to start as long
> as all the original nodes tried to join the cluster. What makes an
> original node that tried to join the cluster but failed special over
> one that never tried to join? It's not going to help us with getting
> copies from it without a manual recover at least.
How can we know whether the added node will join or fail without
trying to add it to the cluster?
The idea behind waiting all the original nodes is that we need to
ensure that there is no other nodes who has the latest data. I'm
using the below script to test master transfer. Is it possible to
pass the test without leave_list? If yes, it's great but I think it
is difficult.
====
#!/bin/bash
set -ex
DRIVER=${DRIVER:-local}
for i in 0 1; do
sheep/sheep /store/$i -z $i -p 700$i -c $DRIVER
sleep 1
done
# start Sheepdog with two nodes
collie/collie cluster format -c 2
for i in 2 3 4; do
# add one node after killing existing one node
pkill -f "sheep /store/$((i - 2))"
sleep 1
sheep/sheep /store/$i -z $i -p 700$i -c $DRIVER
sleep 1
done
# kill all existing nodes
for i in 3 4; do
pkill -f "sheep /store/$i"
sleep 1
done
for i in 0 1 2 3 4; do
sheep/sheep /store/$i -z $i -p 700$i -c $DRIVER
sleep 1
done
echo check whether Sheepdog is running with only one node
collie/collie cluster info -p 7004
# add the other nodes
for i in 0 1 2 3; do
sheep/sheep /store/$i -z $i -p 700$i -c $DRIVER
sleep 1
done
echo check whether all nodes have the same cluster info
for i in 0 1 2 3 4; do
collie/collie cluster info -p 700$i
done
More information about the sheepdog
mailing list