[Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache

Wed Apr 18 08:26:02 CEST 2012

On 03/27/2012 12:01 PM, huxinwei wrote:

>>> My point is about _different_ VMs reading the same object.
>>> > > It doesn't matter how you optimize guest here ...
>> > 
>> > 
>> > We can't read the same object from mutiple VMs right now. (Farm
>> > originally supports naming the object by hash content for data
>> > deduplication)
> We can, if these VMs are actually cloned from the same snapshot ;)
> 

Hi Xinwei,

   Just a notice that with object cache enabled, we found that we
already have this feature: the COW object is shared by all the cloned
VMs for reading!

> BTW: I'm not aware that you are planning data dedup already for farm.
> That'll be really awesome ;)
> However, 4M is far too big for effective deduplication, IMHO.
> It seems we need a patch to change the size of object, e.g. 128K as ZFS.
> 

   I am having an idea that do this data-dedup inside farm.

   1) have a on-disk structure (index file) mapping 4M objects into
128k(whatever, just a placeholder) chunks
   2) farm store these chunks instead of 4M objects with hash(for naming
by outside) from its content
   3) 'index file' use buffered IO with writethrough mode to accelerate read

Pros:
    would get a considerable data deduction

Cons:
    will surely damage farm's IO performance to some extent

I am planning to implement it as optional feature for farm, so this
trade-off is at least desired by users who enable object cache.

Thanks,
Yuan