[sheepdog-users] sheepdog replication got stuck

Liu Yuan namei.unix at gmail.com
Fri Dec 27 08:50:36 CET 2013


On Mon, Dec 23, 2013 at 02:15:55PM +0100, Gerald Richter - ECOS wrote:
> Hi,
> 
> > 
> > So what is the problem? 'qemu-img convert' get hung so that never finish?
> > 
> 
> On the first host (where the qemu-img runs) I have:
> 
> vm-61025-disk-1     0   20 GB   15 GB  0.0 MB 2013-12-19 09:52   bb7a25     3
> 
> on the second one I have:
> 
> vm-61025-disk-1     0   20 GB   36 MB  0.0 MB 2013-12-19 09:52   bb7a25     3              
> 
> Regardless if qemu-img hangs I expect that the second machine show the same "Used" value as the first one (after the time it takes to push the cached content over the network).
> 
> The other question is why qemu-img hangs. I guess (but this can be wrong) it has issued a flush at the end of the import and now is waiting until the cache has been flushed to all nodes. That is how I understand from the docs how it should work.
> 
> At least doing an strace and lsof on the qemu-img process shows that it is waiting for the sheepdog server (select on the sheepdog socket connection).
> 
> Maybe it's important that I run qemu 1.4 because that is part of the distribution (Proxmox) I use and it contains a bunch of patches, so it's not easy to compile from the source.
> 

This QEMU is very old and has known buggy code for sheepdog. I'd suggest you
compile the QEMU yourself and it would be much stable and more importantly
include auto-reconnection support for sheepdog from following command:

$ git clone https://github.com/sheepdog/qemu.git # this is actually based on latest official QEMU with minor fixes
$ cd qemu
$ ./configure --target-list=x86_64-softmmu
$ sudo make install

Thanks
Yuan



More information about the sheepdog-users mailing list