[sheepdog] [REGRESSION] confchg is blocked by cluster requests, resulting in cluster hang

Liu Yuan namei.unix at gmail.com
Wed Jul 4 19:25:30 CEST 2012


Hi list,

   I found a problem for blocking requests handling and confchg, where
confchg event is blocked for ever. I think this is related to last
refactor of block event handling,
------
commit 48fde8304c3fa713cf549ad5523b03151917509a
Author: Christoph Hellwig <hch at infradead.org>
Date:   Wed May 16 03:04:05 2012 -0400

    sheep: rewrite blocked notifications
------

   I had met this problem occasionally for several days and for now I
can reproduce it reliably by following script:

=============
for i in `seq 0 7`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done
sleep 1
collie/collie cluster format  -c 3
echo create new vdis
(
for i in `seq 0 40`;do
collie/collie vdi create test$i 4M
done
) &

echo kill nodes
sleep 1
for i in 1 2 3 4 5; do pkill -f "sheep/sheep -d
/home/tailai.ly/sheepdog/store/$i -z $i -p 700$i";sleep 1;done;

for i in `seq 1 5`; do sheep/sheep -d /home/tailai.ly/sheepdog/store/$i
-z $i -p $((7000+$i));done

echo wait for object recovery to finish
for ((;;)); do
        if [ "$(pgrep collie)" ]; then
                sleep 1
        else
                break
        fi
done
exit 1
===========================



More information about the sheepdog mailing list