[sheepdog-users] sheepdog replication got stuck

Tue Jan 7 07:31:18 CET 2014

Hi Liu,

thanks for your feedback. I will try the test you mentioned later on, when the recovery has finished.

Just one question, my understanding was, that the sheep daemon is flushing the cache after some time on it's own. Is there something like this or is the cache only flushed when a sync comes from the VM?

Thanks & Regards

Gerald

> -----Ursprüngliche Nachricht-----
> Von: Liu Yuan [mailto:namei.unix at gmail.com]
> Gesendet: Dienstag, 7. Januar 2014 07:17
> An: Gerald Richter
> Cc: Hitoshi Mitake; Lista sheepdog user
> Betreff: Re: [sheepdog-users] sheepdog replication got stuck
> 
> On Tue, Jan 07, 2014 at 06:55:34AM +0100, Gerald Richter - ECOS wrote:
> > Hi,
> >
> >
> > > >
> > > > 1. Can you reproduce this problem without object cache?
> >
> > I turned off the object cache on the node where the import is running and
> it worked without any problem. So it seems to be related to the object
> cache.
> >
> > Moreover I had one VM in the sheepdog cluster, which was not involved in
> the test, but which had real data. After I deleted the cache and restarted the
> node for the test, all changes in this VM were gone, because the cache was
> never flushed to the other node. It seems to me that the flush of the cache,
> gets stuck.
> 
> This probably means your QEMU doesn't send FLUSH to object cache.
> 
> >
> > > > 2. How do you "import all images at the same time"? Executing
> > > > qemu-img
> > > in parallel?
> >
> > Yes, just doing a
> >
> > time qemu-img convert -p -t writeback tst-bb-sec-tstmaster1.vhd
> > sheepdog:vm-61012-disk-1 & time qemu-img convert -p -t writeback
> > tst-bb-sec-tstmaster2.vhd sheepdog:vm-61022-disk-1 &
> >
> > and so on.
> >
> > > > 3. Can you check CPU usage of sheep daemon when its hangs?
> > >
> >
> > As far as I remember this is not the case, but this test is still
> > pending, because now I have now about 400GB in the cluster and
> > restarting of sheep and recovery takes several hours every time and in
> > this time even deleting a vdi takes very long (not sure if the vdi
> > need to be recovered before it can be deleted?)
> >
> > I send the final answer as soon as I was able to run the next test.
> >
> > > Last mail Gerald mentioned he used an old qemu. I am wondering if
> > > the qemu the culprit.
> > >
> >
> > I don't think it's related to qemu, because running with object cache
> > turned off works with the same qemu version
> >
> 
> Old QEMU doesn't support object cache or not support it well. To verify if
> your work with object cache. Please try:
> 
> 1 starupt sheepdog with object cache enabled
> 2 run a vm with QEMU option '-drive cache=writeback'
> 3 generate some big file(several GB is good enough) inside VM
> 4 try 'dog vdi cache info' to see how much data are dirty for this VM
> 5 inside VM, execute 'sync' in bash
> 6 after sync returns, try 'dog vdi cache info' to see if dirty data are gone
> 7 shutdown your VM
> 8 try 'dog vdi cache info' to see the cache for this vm is released completely
> 9 reboot the VM and see if the changes are still there
> 
> Thanks
> Yuan