On 06/05/2012 07:14 PM, Liu Yuan wrote: > Also block/sheepdog.c of QEMU have a fatal racy problem, which lead > requests to be discarded by QEMU or segfault, in a high rate of requests > bursting. More info about his problem is: It is highly reproducible: 1) start sheep with async flush 2) start qemu with cache=writeback 3) install a new OS from iso (I installed RHEL 6) the problem is qemu print error code or even segfault, but when I attach the gdb, the problem is gone, so I think it is a race problem. for e.g, qemu-system-x86_64: cannot find aio_req 76e from sheep.log(I have patched sheep): diff --git a/sheep/sdnet.c b/sheep/sdnet.c index 74d42f9..25242d9 100644 --- a/sheep/sdnet.c +++ b/sheep/sdnet.c @@ -502,6 +502,7 @@ static void init_tx_hdr(struct client_info *ci) rsp->epoch = sys->epoch; rsp->opcode = req->rq.opcode; + dprintf("0x%x\n", req->rq.id); rsp->id = req->rq.id; } ... Apr 18 00:28:21 queue_request(275) 1 Apr 18 00:28:21 queue_request(275) 1 Apr 18 00:28:21 init_tx_hdr(505) 0x775 Apr 18 00:28:21 do_io_request(923) 1, 7c2b25000002ba , 1 Apr 18 00:28:21 init_tx_hdr(505) 0x76e Apr 18 00:28:21 object_cache_rw(319) 000002ba, len 2424832, off 0 Apr 18 00:28:21 do_io_request(923) 1, 7c2b25000002b8 , 1 Apr 18 00:28:21 init_tx_hdr(505) 0x76e Apr 18 00:28:21 object_cache_rw(319) 000002b8, len 1142784, off 3051520 ... we can see that aio-req->id is set twice for the same value (0x76e), so the second return will cause qemu to error-print. if IUUC, there should be no race for aioreq_seq_num(in block/sheepdog.c)...but seems there IS... |