[sheepdog-users] sheepdog replication got stuck

Liu Yuan namei.unix at gmail.com
Tue Jan 7 08:23:33 CET 2014


On Tue, Jan 07, 2014 at 08:13:21AM +0100, Gerald Richter - ECOS wrote:
> Hi,
> 
> I am pretty sure that the VM were I lost the data, had more than 40 MB changed data.
> 
> From what I am seeing here when qemu-img imports the vm is, as long as there is only one import process it works fine. As soon as I am doing multiple imports in parallel the object cache flusher hangs at some point (not always the same point) and nothing gets flushed anymore. When the import is finished qemu-img tries to make a sync, but this hangs because the object cache flush hangs, so the import never ends.

So this highly indicate a object cache bug for parallel importing. But I am
currently busy with other stuff.

Hitoshi, do you have time to dig this problem? If it is reliably reproduciable,
I think it is not hard to solve.

>
> BTW. During the last hour were we are writing emails I was only able to delete three VMs (the fours is still in progress) because recovery is still running (without recovery running it normally takes only a few seconds). Need the vdi be fully recovered before it can be deleted?

This is really a corner case, I think sheepdog should support it well at least
in theory. I'd suggest you try to delete VDI in recovery and see what happens
and report bugs if any.

Thanks
Yuan



More information about the sheepdog-users mailing list