[Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache

Mon Mar 26 11:26:45 CEST 2012

> -----Original Message-----
> From: Liu Yuan [mailto:namei.unix at gmail.com]
> Sent: Monday, March 26, 2012 4:01 PM
> To: huxinwei
> Cc: MORITA Kazutaka; sheepdog at lists.wpkg.org
> Subject: Re: [Sheepdog] [PATCH v5 4/8] sheep: teach sheep to use object cache
> 
> On 03/26/2012 03:37 PM, huxinwei wrote:
> 
> > I'd like to have in-memory cache on the sheep side, instead of KVM
> > side. It should be helpful to reduce the IO hits on disk while a lot
> > of VM read the same object (consider OS booting or virus scanning).
> > Relying on in kernel cache is a possible solution, but the behavior
> > then is very hard to predict. Also, as you mentioned earlier, we
> > usually don't have a lot of memory in dom0, which makes the in-memory
> > cache very hard to do for vanilla kernel.
> >
> > My .2 cents.
> 
> 
> If you want memory hit instead of disk IO, why not directly *enlarge*
> Guest memory to increase the page cache size. If the Guest requests can
> be satisfied in its own page cache, the Guest doesn't need to be halted,
> waiting for the requests to be satisfied in the host page cache memory.

My point is about _different_ VMs reading the same object.
It doesn't matter how you optimize guest here ...

> By doing this, we
> 1 need extra memory copy between host page cache and guest memory
> 2 have to afford costly vm_entry/vm_exit and several system calls.
> 3 travel a long way back and forth through QEMU and sheep
> 
> Double page cache simply waste cpu cycles and memory.
> 
> You see lot of requests hit the same object, it is not sheep's fault,
> we'd better get the *right* fix out of virtio_blk driver and IO
> scheduler in guest kernel. (I might do it if time permitting me)
> 
> Currently, sheep object size is 4M, but our kernel just can issue
> request less than 512K for default operation. A simple tune is that you
> can just tune the request size to be as big as sheep object, namely, 4M
> from 'max_sectors_kb' in /sys/block/vda/queue/.
> 
> This would heal the situation a little bit, but unfortunately, the
> virtio_blk or sata driver with the combination effort of IO scheduler
> doesn't do it well enough to merge the request into 4M chunk (in my dd
> test, we can get 3M request chunk). This certainly need fixes.
> 
> Thanks,
> Yuan