On Thu, Dec 29, 2011 at 01:06:26PM +0100, Christoph Hellwig wrote: > On Fri, Dec 23, 2011 at 02:38:50PM +0100, Christoph Hellwig wrote: > > FYI, this causes segfaults when doing large streaming writes when > > running against a sheepdog cluster which: > > > > a) has relatively fast SSDs > > > > and > > > > b) uses buffered I/O. > > > > Unfortunately I can't get a useful backtrace out of gdb. When running just > > this commit I at least get some debugging messages: > > > > qemu-system-x86_64: failed to recv a rsp, Socket operation on non-socket > > qemu-system-x86_64: failed to get the header, Socket operation on non-socket > > > > but on least qemu these don't show up either. > > s/least/latest/ > > Some more debugging. Just for the call that eventually segfaults s->fd > turns from its normal value (normall 13 for me) into 0. This is entirely > reproducable in my testing, and given that the sheepdog driver never > assigns to that value except opening the device this seems to point to > an issue in the coroutine code to me. Are you building with gcc 4.5.3 or later? (Earlier versions may mis-compile, see https://bugs.launchpad.net/qemu/+bug/902148.) If you can reproduce this bug and suspect coroutines are involved then I suggest using gdb to observe the last valid field values of s and the address of s. When the coroutine re-enters make sure that s still has the same address and check if the field values are the same as before. I don't have a sheepdog setup here but if there's an easy way to reproduce please let me know and I'll take a look. Stefan |