[sheepdog] Is it necessary for outstanding io block leave/join event?

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Fri May 18 20:01:56 CEST 2012


At Thu, 17 May 2012 15:48:07 +0800,
Liu Yuan wrote:
> 
> On 05/17/2012 03:40 PM, MORITA Kazutaka wrote:
> 
> > I thought that one advantage of the simple_store driver was that it
> > uses a syscall link() to copy objects from local older epochs to the
> > current epoch, so we could avoid many I/Os in the recovery process.
> > However, it seems that the link operation of the farm driver is not
> > called at all on my environment.  Does Farm do the recovery process in
> > the different way from the simple_store driver?
> 
> 
> farm_link() will be called for multiple nodes events and in a very
> unusual corner cases. Actually, for the case you describe Farm works in
> a more optimal way: there isn't any operations for the object that isn't
> to be migrated to other nodes, save a system call of link() than simple
> store.

If it is true, I wanted to see the implementation in the recovery core
code instead of in the farm driver.  But does the optimization work
correctly?  I couldn't find the code which tries to avoid the
redundant link calls, and actually the farm driver couldn't recover
objects correctly with the following testcase:

[Testcase script]
==
#!/bin/bash

set -ex

STORE=$1

# start three sheep daemons
for i in 0 1 2; do
    ./sheep/sheep /store/$i -z $i -p 700$i -W
done

sleep 1
./collie/collie cluster format -c 2 -b $STORE

# create a pre-allocated vdi
./collie/collie vdi create test 80M -P

# stop the 3rd sheep
pkill -f "sheep /store/2"

# write data to the vdi
cat /dev/urandom | ./collie/collie vdi write test

# restart the 3rd sheep
./sheep/sheep /store/2 -z 2 -p 7002 -W

# wait for object recovery to finish
sleep 10

# show md5sum of the vdi on each node
for i in 0 1 2; do
    ./collie/collie vdi read test -p 700$i | md5sum
done
==

[Results]

 $ ./testcase.sh simple
 ...
 (snip)
 ...
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7000
 + md5sum
 6ebd372401d0848734293709bb7b3cb7  -
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7001
 + md5sum
 6ebd372401d0848734293709bb7b3cb7  -
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7002
 + md5sum
 6ebd372401d0848734293709bb7b3cb7  -

 $ ./testcase.sh farm
 ...
 (snip)
 ...
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7000
 + md5sum
 ef8bd9bbc1f140979405ac08abd24541  -
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7001
 + md5sum
 dee273206981c7f821061310eac90cd3  -
 + for i in 0 1 2
 + ./collie/collie vdi read test -p 7002
 + md5sum
 ca74a3b2e031a20b03c3baa4af9ab9c5  -

> 
> This contributes to Farm to outperform simple store for recovery,
> because most objects are not to be migrated at all for a recovery.

I'm fine with dropping the simple driver if the above kinds of
problems are planed to be fixed in the farm driver.  I wish the
correctness would be regarded as more important than the performance.

Thanks,

Kazutaka



More information about the sheepdog mailing list