[sheepdog-users] sheepdog replication got stuck

Gerald Richter - ECOS gerald.richter at ecos.de
Tue Jan 7 16:51:46 CET 2014


Hi,

I have rerun the parallel import with cache enabled. As I expected the cache did not get flushed. The import is now finished for more the one hour, but there is no change in the cache statistic anymore:

dog vdi cache info
Name	Tag	Total	Dirty	Clean
vm-61037-disk-1		19 GB	0.0 MB	19 GB
vm-61022-disk-1		18 GB	18 GB	48 MB
vm-61012-disk-1		18 GB	18 GB	40 MB
vm-61036-disk-1		19 GB	0.0 MB	19 GB
vm-61025-disk-1		14 GB	0.0 MB	14 GB
vm-61026-disk-1		19 GB	0.0 MB	19 GB
vm-61032-disk-1		19 GB	19 GB	44 MB
vm-61015-disk-1		15 GB	0.0 MB	15 GB
vm-61035-disk-1		17 GB	17 GB	48 MB

Cache size 1.7 GB, used 1.9 GB

The sheep has some CPU usage (between 0% and 3%), but not much.

The interesting piece is, the cache size numbers:

- The size of the cache directory is actually 158G
- The sum of the above Totals is 159G 
- cache info reports " Cache size 1.7 GB, used 1.9 GB"
- sheep is started with -w size=100000 (which should be 100G)

And the disk is 99% full. So it might be an issue that the cache is out of space, but why uses the cache 158G, if it is configured to use 100G?

I will do more test later on

Thanks for your support

Regards

Gerald

> -----Ursprüngliche Nachricht-----
> Von: Hitoshi Mitake [mailto:mitake.hitoshi at gmail.com]
> Gesendet: Dienstag, 7. Januar 2014 15:03
> An: Liu Yuan
> Cc: Gerald Richter; Hitoshi Mitake; Lista sheepdog user
> Betreff: Re: [sheepdog-users] sheepdog replication got stuck
> 
> At Tue, 7 Jan 2014 15:23:33 +0800,
> Liu Yuan wrote:
> >
> > On Tue, Jan 07, 2014 at 08:13:21AM +0100, Gerald Richter - ECOS wrote:
> > > Hi,
> > >
> > > I am pretty sure that the VM were I lost the data, had more than 40 MB
> changed data.
> > >
> > > From what I am seeing here when qemu-img imports the vm is, as long as
> there is only one import process it works fine. As soon as I am doing multiple
> imports in parallel the object cache flusher hangs at some point (not always
> the same point) and nothing gets flushed anymore. When the import is
> finished qemu-img tries to make a sync, but this hangs because the object
> cache flush hangs, so the import never ends.
> >
> > So this highly indicate a object cache bug for parallel importing. But
> > I am currently busy with other stuff.
> >
> > Hitoshi, do you have time to dig this problem? If it is reliably
> > reproduciable, I think it is not hard to solve.
> 
> Yes I'll work on it.
> 
> Thanks,
> Hitoshi




More information about the sheepdog-users mailing list