[Sheepdog] [Qemu-devel] coroutine bug?, was Re: [PATCH] sheepdog: use coroutines

Stefan Hajnoczi stefanha at gmail.com
Fri Dec 30 11:35:01 CET 2011


On Thu, Dec 29, 2011 at 01:06:26PM +0100, Christoph Hellwig wrote:
> On Fri, Dec 23, 2011 at 02:38:50PM +0100, Christoph Hellwig wrote:
> > FYI, this causes segfaults when doing large streaming writes when
> > running against a sheepdog cluster which:
> > 
> >   a) has relatively fast SSDs
> > 
> > and
> > 
> >   b) uses buffered I/O.
> > 
> > Unfortunately I can't get a useful backtrace out of gdb.  When running just
> > this commit I at least get some debugging messages:
> > 
> > qemu-system-x86_64: failed to recv a rsp, Socket operation on non-socket
> > qemu-system-x86_64: failed to get the header, Socket operation on non-socket
> > 
> > but on least qemu these don't show up either.
> 
> s/least/latest/
> 
> Some more debugging.  Just for the call that eventually segfaults s->fd
> turns from its normal value (normall 13 for me) into 0.  This is entirely
> reproducable in my testing, and given that the sheepdog driver never
> assigns to that value except opening the device this seems to point to
> an issue in the coroutine code to me.

Are you building with gcc 4.5.3 or later?  (Earlier versions may
mis-compile, see https://bugs.launchpad.net/qemu/+bug/902148.)

If you can reproduce this bug and suspect coroutines are involved then I
suggest using gdb to observe the last valid field values of s and the
address of s.  When the coroutine re-enters make sure that s still has
the same address and check if the field values are the same as before.

I don't have a sheepdog setup here but if there's an easy way to
reproduce please let me know and I'll take a look.

Stefan



More information about the sheepdog mailing list