[sheepdog-users] sheepdog replication got stuck
namei.unix at gmail.com
Fri Dec 27 08:50:36 CET 2013
On Mon, Dec 23, 2013 at 02:15:55PM +0100, Gerald Richter - ECOS wrote:
> > So what is the problem? 'qemu-img convert' get hung so that never finish?
> On the first host (where the qemu-img runs) I have:
> vm-61025-disk-1 0 20 GB 15 GB 0.0 MB 2013-12-19 09:52 bb7a25 3
> on the second one I have:
> vm-61025-disk-1 0 20 GB 36 MB 0.0 MB 2013-12-19 09:52 bb7a25 3
> Regardless if qemu-img hangs I expect that the second machine show the same "Used" value as the first one (after the time it takes to push the cached content over the network).
> The other question is why qemu-img hangs. I guess (but this can be wrong) it has issued a flush at the end of the import and now is waiting until the cache has been flushed to all nodes. That is how I understand from the docs how it should work.
> At least doing an strace and lsof on the qemu-img process shows that it is waiting for the sheepdog server (select on the sheepdog socket connection).
> Maybe it's important that I run qemu 1.4 because that is part of the distribution (Proxmox) I use and it contains a bunch of patches, so it's not easy to compile from the source.
This QEMU is very old and has known buggy code for sheepdog. I'd suggest you
compile the QEMU yourself and it would be much stable and more importantly
include auto-reconnection support for sheepdog from following command:
$ git clone https://github.com/sheepdog/qemu.git # this is actually based on latest official QEMU with minor fixes
$ cd qemu
$ ./configure --target-list=x86_64-softmmu
$ sudo make install
More information about the sheepdog-users