[Sheepdog] [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines
Stefan Hajnoczi
stefanha at gmail.com
Mon Jan 2 23:38:11 CET 2012
On Mon, Jan 2, 2012 at 3:39 PM, Christoph Hellwig <hch at lst.de> wrote:
> On Fri, Dec 30, 2011 at 10:35:01AM +0000, Stefan Hajnoczi wrote:
>> If you can reproduce this bug and suspect coroutines are involved then I
>
> It's entirely reproducable.
>
> I've played around a bit and switched from the ucontext to the gthreads
> coroutine implementation. The result seems odd, but starts to make sense.
>
> Running the workload I now get the following message from qemu:
>
> Co-routine re-entered recursively
>
> and the gdb backtrace looks like:
>
> (gdb) bt
> #0 0x00007f2fff36f405 in *__GI_raise (sig=<optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x00007f2fff372680 in *__GI_abort () at abort.c:92
> #2 0x00007f30019a6616 in qemu_coroutine_enter (co=0x7f3004d4d7b0, opaque=0x0)
> at qemu-coroutine.c:53
> #3 0x00007f30019a5e82 in qemu_co_queue_next_bh (opaque=<optimized out>)
> at qemu-coroutine-lock.c:43
> #4 0x00007f30018d5a72 in qemu_bh_poll () at async.c:71
> #5 0x00007f3001982990 in main_loop_wait (nonblocking=<optimized out>)
> at main-loop.c:472
> #6 0x00007f30018cf714 in main_loop () at /home/hch/work/qemu/vl.c:1481
> #7 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
> at /home/hch/work/qemu/vl.c:3479
>
> adding some printks suggest this happens when calling add_aio_request from
> aio_read_response when either delaying creates, or updating metadata,
> although not everytime one of these cases happens.
>
> I've tried to understand how the recursive calling happens, but unfortunately
> the whole coroutine code lacks any sort of documentation how it should
> behave or what it asserts about the callers.
>
>> I don't have a sheepdog setup here but if there's an easy way to
>> reproduce please let me know and I'll take a look.
>
> With the small patch below applied to the sheppdog source I can reproduce
> the issue on my laptop using the following setup:
>
> for port in 7000 7001 7002; do
> mkdir -p /mnt/sheepdog/$port
> /usr/sbin/sheep -p $port -c local /mnt/sheepdog/$port
> sleep 2
> done
>
> collie cluster format
> collie vdi create test 20G
>
> then start a qemu instance that uses the the sheepdog volume using the
> following device and drive lines:
>
> -drive if=none,file=sheepdog:test,cache=none,id=test \
> -device virtio-blk-pci,drive=test,id=testdev \
>
> finally, in the guest run:
>
> dd if=/dev/zero of=/dev/vdX bs=67108864 count=128 oflag=direct
Thanks for these instructions. I can reproduce the issue here.
It seems suspicious the way that BDRVSheepdogState->co_recv and
->co_send work. The code adds select(2) read/write callback functions
on the sheepdog socket file descriptor. When the socket becomes
writeable or readable the co_send or co_recv coroutines are entered.
So far, so good, this is how a coroutine is integrated into the main
loop of QEMU.
The problem is that this patch is mixing things. The co_recv
coroutine runs aio_read_response(), which invokes send_pending_req().
send_pending_req() invokes add_aio_request(). That function isn't
suitable for co_recv's context because it actually sends data and hits
a few blocking (yield) points. It takes a coroutine mutex - but the
select(2) read callback is still in place. We're now still in the
aio_read_response() call chain except we're actually not reading at
all, we're trying to write! And we'll get spurious wakeups if there
is any data readable on the socket.
So the co_recv coroutine has two things in the system that will try to enter it:
1. The select(2) read callback on the sheepdog socket.
2. The aio_add_request() blocking operations, including a coroutine mutex.
This is bad, a yielded coroutine should only have one thing that will
enter it. It's rare that it makes sense to have multiple things
entering a coroutine.
It's late here but I hope this gives Kazutaka some thoughts on what is
causing the issue with this patch.
Stefan
More information about the sheepdog
mailing list