[sheepdog] [PATCH v4 8/8] sheep: make gateway requests only retry in io_op_done()

levin li levin108 at gmail.com
Mon May 28 04:56:33 CEST 2012


On 05/28/2012 02:39 AM, MORITA Kazutaka wrote:
> At Sat, 26 May 2012 11:28:28 +0800,
> levin li wrote:
>>
>> On 05/25/2012 10:22 PM, Christoph Hellwig wrote:
>>> On Fri, May 25, 2012 at 10:31:00AM +0800, levin li wrote:
>>>> From: levin li <xingke.lwp at taobao.com>
>>>>
>>>> We should not make gateway retry in check_request when the
>>>> requested object is in recovery, we should make it retry in
>>>> io_op_done(), gateway request does not access local objects,
>>>> so we should not make it retry when the local objects are in
>>>> recovery.
>>>
>>> This patch seems to break the following simple test case that reads
>>> from a newly started sheep because it can't find the object yet:
>>>
>>> sheep -p 7000 /tmp/sheep/0
>>> collie cluster format --copies=1
>>> collie vdi create test-vdi 300M
>>> dd if=/dev/zero count=100M | collie vdi write tet-vdi
>>>
>>> sheep -p 7001 /tmp/sheep/1
>>> collie vdi read -p 7001
>>>
>>
>> I tested with your script, and make it run on my computer like this:
>>
>> sheep -d -p 7000 /tmp/sheep/0 -z 0
>> collie cluster format -c 1
>> collie vdi create test-vdi 300M
>> dd if=/dev/zero count=100M | collie vdi write test-vdi
>> sheep -d -p 7001 /tmp/sheep/1 -z 1
>> collie vdi read -p 7001 test-vdi 0 100M
>>
>> I got a error message:
>> Cannot get VDI info for test-vdi 0 : Waiting for cluster to be formatted
>> Failed to open VDI test-vdi
> 
> This is another problem.
> 
> I think what Christoph pointed out is that the gateway node can have
> the requested object in local, so in such case, sheep returns the
> SD_RES_NO_OBJ error to the caller until the object is recovered.
> 
> I guess the below script could reproduce the problem more easily.
> 
> ==
> OBJSIZE=$((4 * 1024 * 1024))
> 
> sheep -p 7000 /tmp/sheep/0
> sleep 1
> collie cluster format --c=1
> collie vdi create test-vdi 100M
> for i in `seq 0 20`; do
>     echo $i | collie vdi write test-vdi $(($i * $OBJSIZE)) 512
> done
> 
> sheep -p 7001 /tmp/sheep/1
> sleep 1
> 
> for i in `seq 0 20`; do
>     collie vdi read test-vdi -p 7001 $(($i * $OBJSIZE)) 512
> done
> ==
> 

It's indeed a bug of my code, I'd fix it.

thanks,
levin

>>
>> But, even though I popup all the patches in the patch set, it still give
>> that error message, I think it's another problem, or maybe it's not a bug
>> at all, because later I try to run 'collie vdi read -p 7001 test-vdi 0 100M'
>> again, it works well.
> 
> The problem doesn't happen after the recovery is completed.
> 
> Thanks,
> 
> Kazutaka
> 




More information about the sheepdog mailing list