[Sheepdog] [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines
Stefan Hajnoczi
stefanha at gmail.com
Fri Dec 30 11:35:01 CET 2011
On Thu, Dec 29, 2011 at 01:06:26PM +0100, Christoph Hellwig wrote:
> On Fri, Dec 23, 2011 at 02:38:50PM +0100, Christoph Hellwig wrote:
> > FYI, this causes segfaults when doing large streaming writes when
> > running against a sheepdog cluster which:
> >
> > a) has relatively fast SSDs
> >
> > and
> >
> > b) uses buffered I/O.
> >
> > Unfortunately I can't get a useful backtrace out of gdb. When running just
> > this commit I at least get some debugging messages:
> >
> > qemu-system-x86_64: failed to recv a rsp, Socket operation on non-socket
> > qemu-system-x86_64: failed to get the header, Socket operation on non-socket
> >
> > but on least qemu these don't show up either.
>
> s/least/latest/
>
> Some more debugging. Just for the call that eventually segfaults s->fd
> turns from its normal value (normall 13 for me) into 0. This is entirely
> reproducable in my testing, and given that the sheepdog driver never
> assigns to that value except opening the device this seems to point to
> an issue in the coroutine code to me.
Are you building with gcc 4.5.3 or later? (Earlier versions may
mis-compile, see https://bugs.launchpad.net/qemu/+bug/902148.)
If you can reproduce this bug and suspect coroutines are involved then I
suggest using gdb to observe the last valid field values of s and the
address of s. When the coroutine re-enters make sure that s still has
the same address and check if the field values are the same as before.
I don't have a sheepdog setup here but if there's an easy way to
reproduce please let me know and I'll take a look.
Stefan
More information about the sheepdog
mailing list