[Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache

Huxinwei huxinwei at huawei.com
Wed Apr 18 11:46:45 CEST 2012


> -----Original Message-----
> From: Liu Yuan [mailto:namei.unix at gmail.com]
> Sent: Wednesday, April 18, 2012 2:26 PM
> To: Huxinwei
> Cc: MORITA Kazutaka; sheepdog at lists.wpkg.org; Christoph Hellwig
> Subject: Re: [Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache
> 
> On 03/27/2012 12:01 PM, huxinwei wrote:
> 
> >>> My point is about _different_ VMs reading the same object.
> >>> > > It doesn't matter how you optimize guest here ...
> >> >
> >> >
> >> > We can't read the same object from mutiple VMs right now. (Farm
> >> > originally supports naming the object by hash content for data
> >> > deduplication)
> > We can, if these VMs are actually cloned from the same snapshot ;)
> >
> 
> 
> Hi Xinwei,
> 
>    Just a notice that with object cache enabled, we found that we
> already have this feature: the COW object is shared by all the cloned
> VMs for reading!

That is really great ;)

> > BTW: I'm not aware that you are planning data dedup already for farm.
> > That'll be really awesome ;)
> > However, 4M is far too big for effective deduplication, IMHO.
> > It seems we need a patch to change the size of object, e.g. 128K as ZFS.
> >
> 
> 
>    I am having an idea that do this data-dedup inside farm.
> 
>    1) have a on-disk structure (index file) mapping 4M objects into
> 128k(whatever, just a placeholder) chunks
>    2) farm store these chunks instead of 4M objects with hash(for naming
> by outside) from its content
>    3) 'index file' use buffered IO with writethrough mode to accelerate read
> 
> Pros:
>     would get a considerable data deduction
> 
> Cons:
>     will surely damage farm's IO performance to some extent
> 
> I am planning to implement it as optional feature for farm, so this
> trade-off is at least desired by users who enable object cache.

You may want to have a look on this:
http://ansrlab.cse.cuhk.edu.hk/software/livedfs/
They open sourced the implementation for livedfs, which is a dedup solution for VM image.

So far, I think it achieved some balance between performance and space saving.
There's something to learn from.

FYI.

> Thanks,
> Yuan



More information about the sheepdog mailing list