[sheepdog-users] sheepdog replication got stuck

Gerald Richter - ECOS richter at ecos.de
Wed Jan 8 06:34:17 CET 2014


Hi,

I made a further test regarding the cache size issue.

When I import only 6 images (120GB) which is less than the free disk space, everything is fine.

So the problem seems not related to the parallel operation, but all cache operations seem to get on hold when the object cache runs out of disk space. Also it seems that the object cache does not honor the cache size limit.

Regards

Gerald


> -----Ursprüngliche Nachricht-----
> Von: Gerald Richter Im Auftrag von Gerald Richter - ECOS
> Gesendet: Dienstag, 7. Januar 2014 16:52
> An: 'Hitoshi Mitake'; 'Liu Yuan'
> Cc: 'Lista sheepdog user'
> Betreff: AW: [sheepdog-users] sheepdog replication got stuck
> 
> Hi,
> 
> I have rerun the parallel import with cache enabled. As I expected the cache
> did not get flushed. The import is now finished for more the one hour, but
> there is no change in the cache statistic anymore:
> 
> dog vdi cache info
> Name	Tag	Total	Dirty	Clean
> vm-61037-disk-1		19 GB	0.0 MB	19 GB
> vm-61022-disk-1		18 GB	18 GB	48 MB
> vm-61012-disk-1		18 GB	18 GB	40 MB
> vm-61036-disk-1		19 GB	0.0 MB	19 GB
> vm-61025-disk-1		14 GB	0.0 MB	14 GB
> vm-61026-disk-1		19 GB	0.0 MB	19 GB
> vm-61032-disk-1		19 GB	19 GB	44 MB
> vm-61015-disk-1		15 GB	0.0 MB	15 GB
> vm-61035-disk-1		17 GB	17 GB	48 MB
> 
> Cache size 1.7 GB, used 1.9 GB
> 
> The sheep has some CPU usage (between 0% and 3%), but not much.
> 
> The interesting piece is, the cache size numbers:
> 
> - The size of the cache directory is actually 158G
> - The sum of the above Totals is 159G
> - cache info reports " Cache size 1.7 GB, used 1.9 GB"
> - sheep is started with -w size=100000 (which should be 100G)
> 
> And the disk is 99% full. So it might be an issue that the cache is out of space,
> but why uses the cache 158G, if it is configured to use 100G?
> 
> I will do more test later on
> 
> Thanks for your support
> 
> Regards
> 
> Gerald
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Hitoshi Mitake [mailto:mitake.hitoshi at gmail.com]
> > Gesendet: Dienstag, 7. Januar 2014 15:03
> > An: Liu Yuan
> > Cc: Gerald Richter; Hitoshi Mitake; Lista sheepdog user
> > Betreff: Re: [sheepdog-users] sheepdog replication got stuck
> >
> > At Tue, 7 Jan 2014 15:23:33 +0800,
> > Liu Yuan wrote:
> > >
> > > On Tue, Jan 07, 2014 at 08:13:21AM +0100, Gerald Richter - ECOS wrote:
> > > > Hi,
> > > >
> > > > I am pretty sure that the VM were I lost the data, had more than
> > > > 40 MB
> > changed data.
> > > >
> > > > From what I am seeing here when qemu-img imports the vm is, as
> > > > long as
> > there is only one import process it works fine. As soon as I am doing
> > multiple imports in parallel the object cache flusher hangs at some
> > point (not always the same point) and nothing gets flushed anymore.
> > When the import is finished qemu-img tries to make a sync, but this
> > hangs because the object cache flush hangs, so the import never ends.
> > >
> > > So this highly indicate a object cache bug for parallel importing.
> > > But I am currently busy with other stuff.
> > >
> > > Hitoshi, do you have time to dig this problem? If it is reliably
> > > reproduciable, I think it is not hard to solve.
> >
> > Yes I'll work on it.
> >
> > Thanks,
> > Hitoshi




More information about the sheepdog-users mailing list