On 03/27/2012 12:01 PM, huxinwei wrote: >>> My point is about _different_ VMs reading the same object. >>> > > It doesn't matter how you optimize guest here ... >> > >> > >> > We can't read the same object from mutiple VMs right now. (Farm >> > originally supports naming the object by hash content for data >> > deduplication) > We can, if these VMs are actually cloned from the same snapshot ;) > Hi Xinwei, Just a notice that with object cache enabled, we found that we already have this feature: the COW object is shared by all the cloned VMs for reading! > BTW: I'm not aware that you are planning data dedup already for farm. > That'll be really awesome ;) > However, 4M is far too big for effective deduplication, IMHO. > It seems we need a patch to change the size of object, e.g. 128K as ZFS. > I am having an idea that do this data-dedup inside farm. 1) have a on-disk structure (index file) mapping 4M objects into 128k(whatever, just a placeholder) chunks 2) farm store these chunks instead of 4M objects with hash(for naming by outside) from its content 3) 'index file' use buffered IO with writethrough mode to accelerate read Pros: would get a considerable data deduction Cons: will surely damage farm's IO performance to some extent I am planning to implement it as optional feature for farm, so this trade-off is at least desired by users who enable object cache. Thanks, Yuan |