[sheepdog] [PATCH 0/3] fix master transfers
Christoph Hellwig
hch at infradead.org
Tue Aug 7 17:15:38 CEST 2012
A newly joining nod with a higher epoch than the one on the current
master gets a CJ_RES_MASTER_TRANSFER which should make it take over
mastership. Currently this case only kills the current master which
might cause incorrect epoch log entries on other nodes and thus cause
additional crashes down the road.
This series makes sure all nodes die in this case and can be restarted
by the mangement tool so that we can get back to a healthy cluster
quickly. I'd love to be able to totall reset the state inside a sheep
daemon for this case, but so fat I havenot found an easy way for it.
A simple test case for master transfers is below:
#!/bin/bash
set -e
set -x
CLUSTER="-c local"
BASEDIR=/mnt/sheepdog
SRCDIR=/home/hch/work/sheepdog
SHEEP="${SRCDIR}/sheep/sheep ${CLUSTER}"
COLLIE="${SRCDIR}/collie/collie"
killall sheep || true
mkdir -p ${BASEDIR}
rm -rf ${BASEDIR}/7???
# start three sheep and format the cluster
for i in `seq 7000 7002`; do
${SHEEP} -p $i -z $i ${BASEDIR}/${i} -P ${BASEDIR}/${i}/sheep.pid
sleep 1
done
${COLLIE} cluster format
# start three more sheep
for i in `seq 7003 7005`; do
${SHEEP} -p $i -z $i ${BASEDIR}/${i} -P ${BASEDIR}/${i}/sheep.pid
sleep 1
done
# stop three sheep
for i in `seq 7003 7005`; do
kill `cat ${BASEDIR}/${i}/sheep.pid`
rm ${BASEDIR}/${i}/sheep.pid
done
sleep 1
# and shut the cluster down
${COLLIE} cluster shutdown
# restart the three sheep that were stopped earlier
for i in `seq 7005 -1 7003`; do
${SHEEP} -p $i -z $i ${BASEDIR}/${i} -P ${BASEDIR}/${i}/sheep.pid
sleep 1
done
# and restart the first three sheep
for i in `seq 7000 7002`; do
${SHEEP} -p $i -z $i ${BASEDIR}/${i} -P ${BASEDIR}/${i}/sheep.pid
sleep 1
done
More information about the sheepdog
mailing list