At Mon, 27 May 2013 14:23:01 +0800, Liu Yuan wrote: > > + > +void kick_node_recover(void) > +{ > + struct vnode_info *old = main_thread_get(current_vnode_info); > + > + main_thread_set(current_vnode_info, > + alloc_vnode_info(old->nodes, old->nr_nodes)); > + start_recovery(main_thread_get(current_vnode_info), old, false); > + put_vnode_info(old); > +} > + I wonder if starting a recovery without incrementing an epoch is a good idea. The new free space information is not written to the current epoch log, so I guess sheep cannot recover objects correctly if node failure happens while (or after) reweighting. Actually, this series doesn't pass the below test. commit 2b0ec4ea35536be672627094d588358adfbfe18d Author: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> Date: Mon May 27 16:18:33 2013 +0900 tests: test node failure while reweighting Signed-off-by: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> diff --git a/tests/064 b/tests/064 new file mode 100755 index 0000000..2a5c6c0 --- /dev/null +++ b/tests/064 @@ -0,0 +1,58 @@ +#!/bin/bash + +# Test node failure while reweighting +seq=`basename $0` +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +_need_to_be_root + +_cleanup + +_make_device 0 $((100 * 1024 ** 2)) # 100 MB +_make_device 1 $((100 * 1024 ** 2)) # 100 MB +_make_device 2 $((200 * 1024 ** 2)) # 200 MB +_make_device 3 $((100 * 1024 ** 3)) # 100 GB + +#start three in different size +for i in 0 1 2; do + _start_sheep $i +done +_wait_for_sheep 3 +$COLLIE cluster format -c 2 +sleep 1 + +$COLLIE vdi create test 160M -P +$COLLIE node info +for i in 0 1 2; do + $COLLIE node md info -p 700$i +done + +$COLLIE node md plug $STORE/3 +_wait_for_sheep_recovery 0 +$COLLIE node info +for i in 0 1 2; do + $COLLIE node md info -p 700$i +done + +$COLLIE cluster reweight + +# restart sheep1 while reweighting +_kill_sheep 2 +_wait_for_sheep_recovery 0 +_start_sheep 2 +_wait_for_sheep 2 + +_wait_for_sheep_recovery 0 +$COLLIE node info +for i in 0 1 2; do + $COLLIE node md info -p 700$i +done +$COLLIE cluster info | _filter_cluster_info diff --git a/tests/064.out b/tests/064.out new file mode 100644 index 0000000..e69de29 diff --git a/tests/group b/tests/group index 40a24b0..d8824e3 100644 --- a/tests/group +++ b/tests/group @@ -77,3 +77,4 @@ 061 auto quick cluster md 062 auto quick cluster md 063 auto quick cluster md +064 auto quick cluster md |