[Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache
Huxinwei
huxinwei at huawei.com
Wed Apr 18 11:46:45 CEST 2012
> -----Original Message-----
> From: Liu Yuan [mailto:namei.unix at gmail.com]
> Sent: Wednesday, April 18, 2012 2:26 PM
> To: Huxinwei
> Cc: MORITA Kazutaka; sheepdog at lists.wpkg.org; Christoph Hellwig
> Subject: Re: [Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache
>
> On 03/27/2012 12:01 PM, huxinwei wrote:
>
> >>> My point is about _different_ VMs reading the same object.
> >>> > > It doesn't matter how you optimize guest here ...
> >> >
> >> >
> >> > We can't read the same object from mutiple VMs right now. (Farm
> >> > originally supports naming the object by hash content for data
> >> > deduplication)
> > We can, if these VMs are actually cloned from the same snapshot ;)
> >
>
>
> Hi Xinwei,
>
> Just a notice that with object cache enabled, we found that we
> already have this feature: the COW object is shared by all the cloned
> VMs for reading!
That is really great ;)
> > BTW: I'm not aware that you are planning data dedup already for farm.
> > That'll be really awesome ;)
> > However, 4M is far too big for effective deduplication, IMHO.
> > It seems we need a patch to change the size of object, e.g. 128K as ZFS.
> >
>
>
> I am having an idea that do this data-dedup inside farm.
>
> 1) have a on-disk structure (index file) mapping 4M objects into
> 128k(whatever, just a placeholder) chunks
> 2) farm store these chunks instead of 4M objects with hash(for naming
> by outside) from its content
> 3) 'index file' use buffered IO with writethrough mode to accelerate read
>
> Pros:
> would get a considerable data deduction
>
> Cons:
> will surely damage farm's IO performance to some extent
>
> I am planning to implement it as optional feature for farm, so this
> trade-off is at least desired by users who enable object cache.
You may want to have a look on this:
http://ansrlab.cse.cuhk.edu.hk/software/livedfs/
They open sourced the implementation for livedfs, which is a dedup solution for VM image.
So far, I think it achieved some balance between performance and space saving.
There's something to learn from.
FYI.
> Thanks,
> Yuan
More information about the sheepdog
mailing list