[sheepdog] Is it necessary for outstanding io block leave/join event?
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri May 18 20:01:56 CEST 2012
At Thu, 17 May 2012 15:48:07 +0800,
Liu Yuan wrote:
>
> On 05/17/2012 03:40 PM, MORITA Kazutaka wrote:
>
> > I thought that one advantage of the simple_store driver was that it
> > uses a syscall link() to copy objects from local older epochs to the
> > current epoch, so we could avoid many I/Os in the recovery process.
> > However, it seems that the link operation of the farm driver is not
> > called at all on my environment. Does Farm do the recovery process in
> > the different way from the simple_store driver?
>
>
> farm_link() will be called for multiple nodes events and in a very
> unusual corner cases. Actually, for the case you describe Farm works in
> a more optimal way: there isn't any operations for the object that isn't
> to be migrated to other nodes, save a system call of link() than simple
> store.
If it is true, I wanted to see the implementation in the recovery core
code instead of in the farm driver. But does the optimization work
correctly? I couldn't find the code which tries to avoid the
redundant link calls, and actually the farm driver couldn't recover
objects correctly with the following testcase:
[Testcase script]
==
#!/bin/bash
set -ex
STORE=$1
# start three sheep daemons
for i in 0 1 2; do
./sheep/sheep /store/$i -z $i -p 700$i -W
done
sleep 1
./collie/collie cluster format -c 2 -b $STORE
# create a pre-allocated vdi
./collie/collie vdi create test 80M -P
# stop the 3rd sheep
pkill -f "sheep /store/2"
# write data to the vdi
cat /dev/urandom | ./collie/collie vdi write test
# restart the 3rd sheep
./sheep/sheep /store/2 -z 2 -p 7002 -W
# wait for object recovery to finish
sleep 10
# show md5sum of the vdi on each node
for i in 0 1 2; do
./collie/collie vdi read test -p 700$i | md5sum
done
==
[Results]
$ ./testcase.sh simple
...
(snip)
...
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7000
+ md5sum
6ebd372401d0848734293709bb7b3cb7 -
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7001
+ md5sum
6ebd372401d0848734293709bb7b3cb7 -
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7002
+ md5sum
6ebd372401d0848734293709bb7b3cb7 -
$ ./testcase.sh farm
...
(snip)
...
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7000
+ md5sum
ef8bd9bbc1f140979405ac08abd24541 -
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7001
+ md5sum
dee273206981c7f821061310eac90cd3 -
+ for i in 0 1 2
+ ./collie/collie vdi read test -p 7002
+ md5sum
ca74a3b2e031a20b03c3baa4af9ab9c5 -
>
> This contributes to Farm to outperform simple store for recovery,
> because most objects are not to be migrated at all for a recovery.
I'm fine with dropping the simple driver if the above kinds of
problems are planed to be fixed in the farm driver. I wish the
correctness would be regarded as more important than the performance.
Thanks,
Kazutaka
More information about the sheepdog
mailing list