[sheepdog-users] sheepdog replication got stuck

Tue Jan 7 06:55:34 CET 2014

Hi,

> >
> > 1. Can you reproduce this problem without object cache?

I turned off the object cache on the node where the import is running and it worked without any problem. So it seems to be related to the object cache.

Moreover I had one VM in the sheepdog cluster, which was not involved in the test, but which had real data. After I deleted the cache and restarted the node for the test, all changes in this VM were gone, because the cache was never flushed to the other node. It seems to me that the flush of the cache, gets stuck.

> > 2. How do you "import all images at the same time"? Executing qemu-img
> in parallel?

Yes, just doing a 

time qemu-img convert -p -t writeback tst-bb-sec-tstmaster1.vhd sheepdog:vm-61012-disk-1 &
time qemu-img convert -p -t writeback tst-bb-sec-tstmaster2.vhd sheepdog:vm-61022-disk-1 &

and so on.

> > 3. Can you check CPU usage of sheep daemon when its hangs?
> 

As far as I remember this is not the case, but this test is still pending, because now I have now about 400GB in the cluster and restarting of sheep and recovery takes several hours every time and in this time even deleting a vdi takes very long (not sure if the vdi need to be recovered before it can be deleted?)

I send the final answer as soon as I was able to run the next test.

> Last mail Gerald mentioned he used an old qemu. I am wondering if the
> qemu the culprit.
> 

I don't think it's related to qemu, because running with object cache turned off works with the same qemu version

Regards

Gerald