On 05/28/2012 02:39 AM, MORITA Kazutaka wrote: > At Sat, 26 May 2012 11:28:28 +0800, > levin li wrote: >> >> On 05/25/2012 10:22 PM, Christoph Hellwig wrote: >>> On Fri, May 25, 2012 at 10:31:00AM +0800, levin li wrote: >>>> From: levin li <xingke.lwp at taobao.com> >>>> >>>> We should not make gateway retry in check_request when the >>>> requested object is in recovery, we should make it retry in >>>> io_op_done(), gateway request does not access local objects, >>>> so we should not make it retry when the local objects are in >>>> recovery. >>> >>> This patch seems to break the following simple test case that reads >>> from a newly started sheep because it can't find the object yet: >>> >>> sheep -p 7000 /tmp/sheep/0 >>> collie cluster format --copies=1 >>> collie vdi create test-vdi 300M >>> dd if=/dev/zero count=100M | collie vdi write tet-vdi >>> >>> sheep -p 7001 /tmp/sheep/1 >>> collie vdi read -p 7001 >>> >> >> I tested with your script, and make it run on my computer like this: >> >> sheep -d -p 7000 /tmp/sheep/0 -z 0 >> collie cluster format -c 1 >> collie vdi create test-vdi 300M >> dd if=/dev/zero count=100M | collie vdi write test-vdi >> sheep -d -p 7001 /tmp/sheep/1 -z 1 >> collie vdi read -p 7001 test-vdi 0 100M >> >> I got a error message: >> Cannot get VDI info for test-vdi 0 : Waiting for cluster to be formatted >> Failed to open VDI test-vdi > > This is another problem. > > I think what Christoph pointed out is that the gateway node can have > the requested object in local, so in such case, sheep returns the > SD_RES_NO_OBJ error to the caller until the object is recovered. > > I guess the below script could reproduce the problem more easily. > > == > OBJSIZE=$((4 * 1024 * 1024)) > > sheep -p 7000 /tmp/sheep/0 > sleep 1 > collie cluster format --c=1 > collie vdi create test-vdi 100M > for i in `seq 0 20`; do > echo $i | collie vdi write test-vdi $(($i * $OBJSIZE)) 512 > done > > sheep -p 7001 /tmp/sheep/1 > sleep 1 > > for i in `seq 0 20`; do > collie vdi read test-vdi -p 7001 $(($i * $OBJSIZE)) 512 > done > == > It's indeed a bug of my code, I'd fix it. thanks, levin >> >> But, even though I popup all the patches in the patch set, it still give >> that error message, I think it's another problem, or maybe it's not a bug >> at all, because later I try to run 'collie vdi read -p 7001 test-vdi 0 100M' >> again, it works well. > > The problem doesn't happen after the recovery is completed. > > Thanks, > > Kazutaka > |