[sheepdog-users] sheepdog replication got stuck

Liu Yuan namei.unix at gmail.com
Tue Jan 7 07:17:18 CET 2014


On Tue, Jan 07, 2014 at 06:55:34AM +0100, Gerald Richter - ECOS wrote:
> Hi,
> 
> 
> > >
> > > 1. Can you reproduce this problem without object cache?
> 
> I turned off the object cache on the node where the import is running and it worked without any problem. So it seems to be related to the object cache.
> 
> Moreover I had one VM in the sheepdog cluster, which was not involved in the test, but which had real data. After I deleted the cache and restarted the node for the test, all changes in this VM were gone, because the cache was never flushed to the other node. It seems to me that the flush of the cache, gets stuck.

This probably means your QEMU doesn't send FLUSH to object cache.

> 
> > > 2. How do you "import all images at the same time"? Executing qemu-img
> > in parallel?
> 
> Yes, just doing a 
> 
> time qemu-img convert -p -t writeback tst-bb-sec-tstmaster1.vhd sheepdog:vm-61012-disk-1 &
> time qemu-img convert -p -t writeback tst-bb-sec-tstmaster2.vhd sheepdog:vm-61022-disk-1 &
> 
> and so on.
> 
> > > 3. Can you check CPU usage of sheep daemon when its hangs?
> > 
> 
> As far as I remember this is not the case, but this test is still pending, because now I have now about 400GB in the cluster and restarting of sheep and recovery takes several hours every time and in this time even deleting a vdi takes very long (not sure if the vdi need to be recovered before it can be deleted?)
> 
> I send the final answer as soon as I was able to run the next test.
> 
> > Last mail Gerald mentioned he used an old qemu. I am wondering if the
> > qemu the culprit.
> > 
> 
> I don't think it's related to qemu, because running with object cache turned off works with the same qemu version
> 

Old QEMU doesn't support object cache or not support it well. To verify if your
work with object cache. Please try:

1 starupt sheepdog with object cache enabled
2 run a vm with QEMU option '-drive cache=writeback'
3 generate some big file(several GB is good enough) inside VM
4 try 'dog vdi cache info' to see how much data are dirty for this VM
5 inside VM, execute 'sync' in bash
6 after sync returns, try 'dog vdi cache info' to see if dirty data are gone
7 shutdown your VM
8 try 'dog vdi cache info' to see the cache for this vm is released completely
9 reboot the VM and see if the changes are still there

Thanks
Yuan



More information about the sheepdog-users mailing list