On 05/19/2012 02:01 AM, MORITA Kazutaka wrote: > At Thu, 17 May 2012 15:48:07 +0800, > Liu Yuan wrote: >> >> On 05/17/2012 03:40 PM, MORITA Kazutaka wrote: >> >>> I thought that one advantage of the simple_store driver was that it >>> uses a syscall link() to copy objects from local older epochs to the >>> current epoch, so we could avoid many I/Os in the recovery process. >>> However, it seems that the link operation of the farm driver is not >>> called at all on my environment. Does Farm do the recovery process in >>> the different way from the simple_store driver? >> >> >> farm_link() will be called for multiple nodes events and in a very >> unusual corner cases. Actually, for the case you describe Farm works in >> a more optimal way: there isn't any operations for the object that isn't >> to be migrated to other nodes, save a system call of link() than simple >> store. > > If it is true, I wanted to see the implementation in the recovery core > code instead of in the farm driver. But does the optimization work > correctly? I couldn't find the code which tries to avoid the > redundant link calls, and actually the farm driver couldn't recover > objects correctly with the following testcase: > I can't simply code it in the core recovery code because simple store and farm doesn't agree on the underlying layout.(object assume epoch/oid, while farm assumes only oid as its naming method). But if we remove simple store, we might get a better core code. > [Testcase script] > == > #!/bin/bash > > set -ex > > STORE=$1 > > # start three sheep daemons > for i in 0 1 2; do > ./sheep/sheep /store/$i -z $i -p 700$i -W > done > > sleep 1 > ./collie/collie cluster format -c 2 -b $STORE > > # create a pre-allocated vdi > ./collie/collie vdi create test 80M -P > > # stop the 3rd sheep > pkill -f "sheep /store/2" > > # write data to the vdi > cat /dev/urandom | ./collie/collie vdi write test > > # restart the 3rd sheep > ./sheep/sheep /store/2 -z 2 -p 7002 -W > > # wait for object recovery to finish > sleep 10 > > # show md5sum of the vdi on each node > for i in 0 1 2; do > ./collie/collie vdi read test -p 700$i | md5sum > done > == > Very good test script, I've drafted a patch for it, with this patch, farm can work as nice as expected. > [Results] > > $ ./testcase.sh simple > ... > (snip) > ... > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7000 > + md5sum > 6ebd372401d0848734293709bb7b3cb7 - > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7001 > + md5sum > 6ebd372401d0848734293709bb7b3cb7 - > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7002 > + md5sum > 6ebd372401d0848734293709bb7b3cb7 - > > $ ./testcase.sh farm > ... > (snip) > ... > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7000 > + md5sum > ef8bd9bbc1f140979405ac08abd24541 - > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7001 > + md5sum > dee273206981c7f821061310eac90cd3 - > + for i in 0 1 2 > + ./collie/collie vdi read test -p 7002 > + md5sum > ca74a3b2e031a20b03c3baa4af9ab9c5 - > >> >> This contributes to Farm to outperform simple store for recovery, >> because most objects are not to be migrated at all for a recovery. > > I'm fine with dropping the simple driver if the above kinds of > problems are planed to be fixed in the farm driver. I wish the > correctness would be regarded as more important than the performance. > Sure, I think farm can meet the needs of correctness, there might be some bug hanging over like above example, but doesn't necessarily mean farm can't fix them. Thanks, Yuan |