At Sat, 26 May 2012 11:28:28 +0800, levin li wrote: > > On 05/25/2012 10:22 PM, Christoph Hellwig wrote: > > On Fri, May 25, 2012 at 10:31:00AM +0800, levin li wrote: > >> From: levin li <xingke.lwp at taobao.com> > >> > >> We should not make gateway retry in check_request when the > >> requested object is in recovery, we should make it retry in > >> io_op_done(), gateway request does not access local objects, > >> so we should not make it retry when the local objects are in > >> recovery. > > > > This patch seems to break the following simple test case that reads > > from a newly started sheep because it can't find the object yet: > > > > sheep -p 7000 /tmp/sheep/0 > > collie cluster format --copies=1 > > collie vdi create test-vdi 300M > > dd if=/dev/zero count=100M | collie vdi write tet-vdi > > > > sheep -p 7001 /tmp/sheep/1 > > collie vdi read -p 7001 > > > > I tested with your script, and make it run on my computer like this: > > sheep -d -p 7000 /tmp/sheep/0 -z 0 > collie cluster format -c 1 > collie vdi create test-vdi 300M > dd if=/dev/zero count=100M | collie vdi write test-vdi > sheep -d -p 7001 /tmp/sheep/1 -z 1 > collie vdi read -p 7001 test-vdi 0 100M > > I got a error message: > Cannot get VDI info for test-vdi 0 : Waiting for cluster to be formatted > Failed to open VDI test-vdi This is another problem. I think what Christoph pointed out is that the gateway node can have the requested object in local, so in such case, sheep returns the SD_RES_NO_OBJ error to the caller until the object is recovered. I guess the below script could reproduce the problem more easily. == OBJSIZE=$((4 * 1024 * 1024)) sheep -p 7000 /tmp/sheep/0 sleep 1 collie cluster format --c=1 collie vdi create test-vdi 100M for i in `seq 0 20`; do echo $i | collie vdi write test-vdi $(($i * $OBJSIZE)) 512 done sheep -p 7001 /tmp/sheep/1 sleep 1 for i in `seq 0 20`; do collie vdi read test-vdi -p 7001 $(($i * $OBJSIZE)) 512 done == > > But, even though I popup all the patches in the patch set, it still give > that error message, I think it's another problem, or maybe it's not a bug > at all, because later I try to run 'collie vdi read -p 7001 test-vdi 0 100M' > again, it works well. The problem doesn't happen after the recovery is completed. Thanks, Kazutaka |