[sheepdog] [PATCH v4 8/8] sheep: make gateway requests only retry in io_op_done()

MORITA Kazutaka morita.kazutaka at gmail.com
Sun May 27 20:39:30 CEST 2012


At Sat, 26 May 2012 11:28:28 +0800,
levin li wrote:
> 
> On 05/25/2012 10:22 PM, Christoph Hellwig wrote:
> > On Fri, May 25, 2012 at 10:31:00AM +0800, levin li wrote:
> >> From: levin li <xingke.lwp at taobao.com>
> >>
> >> We should not make gateway retry in check_request when the
> >> requested object is in recovery, we should make it retry in
> >> io_op_done(), gateway request does not access local objects,
> >> so we should not make it retry when the local objects are in
> >> recovery.
> > 
> > This patch seems to break the following simple test case that reads
> > from a newly started sheep because it can't find the object yet:
> > 
> > sheep -p 7000 /tmp/sheep/0
> > collie cluster format --copies=1
> > collie vdi create test-vdi 300M
> > dd if=/dev/zero count=100M | collie vdi write tet-vdi
> > 
> > sheep -p 7001 /tmp/sheep/1
> > collie vdi read -p 7001
> > 
> 
> I tested with your script, and make it run on my computer like this:
> 
> sheep -d -p 7000 /tmp/sheep/0 -z 0
> collie cluster format -c 1
> collie vdi create test-vdi 300M
> dd if=/dev/zero count=100M | collie vdi write test-vdi
> sheep -d -p 7001 /tmp/sheep/1 -z 1
> collie vdi read -p 7001 test-vdi 0 100M
> 
> I got a error message:
> Cannot get VDI info for test-vdi 0 : Waiting for cluster to be formatted
> Failed to open VDI test-vdi

This is another problem.

I think what Christoph pointed out is that the gateway node can have
the requested object in local, so in such case, sheep returns the
SD_RES_NO_OBJ error to the caller until the object is recovered.

I guess the below script could reproduce the problem more easily.

==
OBJSIZE=$((4 * 1024 * 1024))

sheep -p 7000 /tmp/sheep/0
sleep 1
collie cluster format --c=1
collie vdi create test-vdi 100M
for i in `seq 0 20`; do
    echo $i | collie vdi write test-vdi $(($i * $OBJSIZE)) 512
done

sheep -p 7001 /tmp/sheep/1
sleep 1

for i in `seq 0 20`; do
    collie vdi read test-vdi -p 7001 $(($i * $OBJSIZE)) 512
done
==

> 
> But, even though I popup all the patches in the patch set, it still give
> that error message, I think it's another problem, or maybe it's not a bug
> at all, because later I try to run 'collie vdi read -p 7001 test-vdi 0 100M'
> again, it works well.

The problem doesn't happen after the recovery is completed.

Thanks,

Kazutaka




More information about the sheepdog mailing list