On Fri, Dec 30, 2011 at 10:35:01AM +0000, Stefan Hajnoczi wrote: > Are you building with gcc 4.5.3 or later? (Earlier versions may > mis-compile, see https://bugs.launchpad.net/qemu/+bug/902148.) I'm using "gcc version 4.6.2 (Debian 4.6.2-9)", so that should not be an issue. > If you can reproduce this bug and suspect coroutines are involved then I It's entirely reproducable. I've played around a bit and switched from the ucontext to the gthreads coroutine implementation. The result seems odd, but starts to make sense. Running the workload I now get the following message from qemu: Co-routine re-entered recursively and the gdb backtrace looks like: (gdb) bt #0 0x00007f2fff36f405 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f2fff372680 in *__GI_abort () at abort.c:92 #2 0x00007f30019a6616 in qemu_coroutine_enter (co=0x7f3004d4d7b0, opaque=0x0) at qemu-coroutine.c:53 #3 0x00007f30019a5e82 in qemu_co_queue_next_bh (opaque=<optimized out>) at qemu-coroutine-lock.c:43 #4 0x00007f30018d5a72 in qemu_bh_poll () at async.c:71 #5 0x00007f3001982990 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:472 #6 0x00007f30018cf714 in main_loop () at /home/hch/work/qemu/vl.c:1481 #7 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/hch/work/qemu/vl.c:3479 adding some printks suggest this happens when calling add_aio_request from aio_read_response when either delaying creates, or updating metadata, although not everytime one of these cases happens. I've tried to understand how the recursive calling happens, but unfortunately the whole coroutine code lacks any sort of documentation how it should behave or what it asserts about the callers. > I don't have a sheepdog setup here but if there's an easy way to > reproduce please let me know and I'll take a look. With the small patch below applied to the sheppdog source I can reproduce the issue on my laptop using the following setup: for port in 7000 7001 7002; do mkdir -p /mnt/sheepdog/$port /usr/sbin/sheep -p $port -c local /mnt/sheepdog/$port sleep 2 done collie cluster format collie vdi create test 20G then start a qemu instance that uses the the sheepdog volume using the following device and drive lines: -drive if=none,file=sheepdog:test,cache=none,id=test \ -device virtio-blk-pci,drive=test,id=testdev \ finally, in the guest run: dd if=/dev/zero of=/dev/vdX bs=67108864 count=128 oflag=direct |