[sheepdog-users] sheepdog replication got stuck

Gerald Richter - ECOS richter at ecos.de
Tue Jan 7 08:13:21 CET 2014


Hi,

I am pretty sure that the VM were I lost the data, had more than 40 MB changed data.

>From what I am seeing here when qemu-img imports the vm is, as long as there is only one import process it works fine. As soon as I am doing multiple imports in parallel the object cache flusher hangs at some point (not always the same point) and nothing gets flushed anymore. When the import is finished qemu-img tries to make a sync, but this hangs because the object cache flush hangs, so the import never ends.

BTW. During the last hour were we are writing emails I was only able to delete three VMs (the fours is still in progress) because recovery is still running (without recovery running it normally takes only a few seconds). Need the vdi be fully recovered before it can be deleted?

So the test you suggested is still outstanding...

Regards

Gerald

P.S. I think having a cache flush after some (maybe configurable) time period would be a good idea

> -----Ursprüngliche Nachricht-----
> Von: Liu Yuan [mailto:namei.unix at gmail.com]
> Gesendet: Dienstag, 7. Januar 2014 07:54
> An: Gerald Richter
> Cc: Hitoshi Mitake; Lista sheepdog user
> Betreff: Re: [sheepdog-users] sheepdog replication got stuck
> 
> On Tue, Jan 07, 2014 at 07:31:18AM +0100, Gerald Richter - ECOS wrote:
> > Hi Liu,
> >
> > thanks for your feedback. I will try the test you mentioned later on, when
> the recovery has finished.
> >
> > Just one question, my understanding was, that the sheep daemon is
> flushing the cache after some time on it's own. Is there something like this or
> is the cache only flushed when a sync comes from the VM?
> >
> 
> Yes, we have a background flusher which will be triggered if more than 10
> objects(40MB) are dirty periodically. But you have dirty data less than 40M,
> no background flusher will be triggered and it can only be flushed by VM's
> proactive flush requests from VM's filesystem or 'sync|fsync' from user
> space.
> 
> I think of adding yet another background flusher, which is triggered by time
> interval, e.g, 30s once a time to flush any dirty objects. This will workaround
> buggy QEMU problem.
> 
> Thanks
> Yuan




More information about the sheepdog-users mailing list