[sheepdog] [PATCH v4 8/8] sheep: make gateway requests only retry in io_op_done()
MORITA Kazutaka
morita.kazutaka at gmail.com
Sun May 27 20:39:30 CEST 2012
At Sat, 26 May 2012 11:28:28 +0800,
levin li wrote:
>
> On 05/25/2012 10:22 PM, Christoph Hellwig wrote:
> > On Fri, May 25, 2012 at 10:31:00AM +0800, levin li wrote:
> >> From: levin li <xingke.lwp at taobao.com>
> >>
> >> We should not make gateway retry in check_request when the
> >> requested object is in recovery, we should make it retry in
> >> io_op_done(), gateway request does not access local objects,
> >> so we should not make it retry when the local objects are in
> >> recovery.
> >
> > This patch seems to break the following simple test case that reads
> > from a newly started sheep because it can't find the object yet:
> >
> > sheep -p 7000 /tmp/sheep/0
> > collie cluster format --copies=1
> > collie vdi create test-vdi 300M
> > dd if=/dev/zero count=100M | collie vdi write tet-vdi
> >
> > sheep -p 7001 /tmp/sheep/1
> > collie vdi read -p 7001
> >
>
> I tested with your script, and make it run on my computer like this:
>
> sheep -d -p 7000 /tmp/sheep/0 -z 0
> collie cluster format -c 1
> collie vdi create test-vdi 300M
> dd if=/dev/zero count=100M | collie vdi write test-vdi
> sheep -d -p 7001 /tmp/sheep/1 -z 1
> collie vdi read -p 7001 test-vdi 0 100M
>
> I got a error message:
> Cannot get VDI info for test-vdi 0 : Waiting for cluster to be formatted
> Failed to open VDI test-vdi
This is another problem.
I think what Christoph pointed out is that the gateway node can have
the requested object in local, so in such case, sheep returns the
SD_RES_NO_OBJ error to the caller until the object is recovered.
I guess the below script could reproduce the problem more easily.
==
OBJSIZE=$((4 * 1024 * 1024))
sheep -p 7000 /tmp/sheep/0
sleep 1
collie cluster format --c=1
collie vdi create test-vdi 100M
for i in `seq 0 20`; do
echo $i | collie vdi write test-vdi $(($i * $OBJSIZE)) 512
done
sheep -p 7001 /tmp/sheep/1
sleep 1
for i in `seq 0 20`; do
collie vdi read test-vdi -p 7001 $(($i * $OBJSIZE)) 512
done
==
>
> But, even though I popup all the patches in the patch set, it still give
> that error message, I think it's another problem, or maybe it's not a bug
> at all, because later I try to run 'collie vdi read -p 7001 test-vdi 0 100M'
> again, it works well.
The problem doesn't happen after the recovery is completed.
Thanks,
Kazutaka
More information about the sheepdog
mailing list