[sheepdog] [PATCH] sheep: fix oid scheduling in recovery
Liu Yuan
namei.unix at gmail.com
Tue Jun 5 13:17:37 CEST 2012
On 06/05/2012 07:14 PM, Liu Yuan wrote:
> Also block/sheepdog.c of QEMU have a fatal racy problem, which lead
> requests to be discarded by QEMU or segfault, in a high rate of requests
> bursting.
More info about his problem is:
It is highly reproducible:
1) start sheep with async flush
2) start qemu with cache=writeback
3) install a new OS from iso (I installed RHEL 6)
the problem is
qemu print error code or even segfault, but when I attach the gdb, the
problem is gone, so I think it is a race problem.
for e.g,
qemu-system-x86_64: cannot find aio_req 76e
from sheep.log(I have patched sheep):
diff --git a/sheep/sdnet.c b/sheep/sdnet.c
index 74d42f9..25242d9 100644
--- a/sheep/sdnet.c
+++ b/sheep/sdnet.c
@@ -502,6 +502,7 @@ static void init_tx_hdr(struct client_info *ci)
rsp->epoch = sys->epoch;
rsp->opcode = req->rq.opcode;
+ dprintf("0x%x\n", req->rq.id);
rsp->id = req->rq.id;
}
...
Apr 18 00:28:21 queue_request(275) 1
Apr 18 00:28:21 queue_request(275) 1
Apr 18 00:28:21 init_tx_hdr(505) 0x775
Apr 18 00:28:21 do_io_request(923) 1, 7c2b25000002ba , 1
Apr 18 00:28:21 init_tx_hdr(505) 0x76e
Apr 18 00:28:21 object_cache_rw(319) 000002ba, len 2424832, off 0
Apr 18 00:28:21 do_io_request(923) 1, 7c2b25000002b8 , 1
Apr 18 00:28:21 init_tx_hdr(505) 0x76e
Apr 18 00:28:21 object_cache_rw(319) 000002b8, len 1142784, off 3051520
...
we can see that aio-req->id is set twice for the same value (0x76e), so
the second return will cause qemu to error-print.
if IUUC, there should be no race for aioreq_seq_num(in
block/sheepdog.c)...but seems there IS...
More information about the sheepdog
mailing list