[sheepdog] Questions on the virtual disk's cache type

Wed Jan 23 15:15:53 CET 2013

On Wed, Jan 23, 2013 at 09:15:47PM +0800, Liu Yuan wrote:
> On 01/23/2013 08:34 PM, Stefan Hajnoczi wrote:
> > On Wed, Jan 23, 2013 at 06:47:55PM +0800, Liu Yuan wrote:
> >> On 01/23/2013 06:14 PM, Daniel P. Berrange wrote:
> >>> On Wed, Jan 23, 2013 at 06:09:01PM +0800, Liu Yuan wrote:
> >>>> On 01/23/2013 05:30 PM, Daniel P. Berrange wrote:
> >>>>> FYI There is a patch proposed for customization
> >>>>>
> >>>>>   https://review.openstack.org/#/c/18042/
> >>>>>
> >>>>
> >>>> Seems that this patch is dropped and declined?
> >>>>
> >>>>>
> >>>>> I should note that it is wrong to assume that enabling cache mode will
> >>>>> improve the performance in general. Allowing caching in the host will
> >>>>> require a non-negligable amount of host RAM to have a benefit. RAM is
> >>>>> usually the most constrained resource in any virtualization environment.
> >>>>> So while the cache may help performance when only one or two Vms are
> >>>>> running on the host, it may well in fact hurt performance once the host
> >>>>> is running enough VMs to max out RAM. So allowing caching will actually
> >>>>> give you quite variable performance, while the cache=none will give you
> >>>>> consistent performance regardless of host RAM utilization (underlying
> >>>>> contention of the storage device may of course still impact things).
> >>>>
> >>>> Yeah, allowing page cache in the host might not be a good idea to run
> >>>> multiple VMs, but cache type in QEMU has different meaning for network
> >>>> block devices. For e.g, we use 'cache type' to control client side cache
> >>>> of Sheepdog cluster, which implement a object cache in the local disk
> >>>> for performance boost and reducing network traffics. This doesn't
> >>>> consume memory at all, just occupy the disk space where runs sheep daemon.
> > 
> > How can it be a "client-side cache" if it doesn't consume memory on the
> > client?
> > 
> > Please explain how the "client-side cache" feature works.  I'm not
> > familiar with sheepdog internals.
> > 
> 
> Let me start with local file as backend of block device of QEMU. It
> basically uses host memory pages to cache blocks of emulated device.
> Kernel internally maps those blocks into pages of file (A.K.A page
> cache) and then we relies on the kernel memory subsystem to do writeback
> of those cached pages. When VM read/write some blocks, kernel allocate
> pages on demand to serve the read/write requests operated on the pages.
> 
> QEMU <----> VM
>   ^
>   |                       writeback/readahead pages
>   V                              |
> POSIX file < --- > page cache < --- > disk
>                       |
>         kernel does page wb/ra and reclaim
> 
> Object cache of Sheepdog do the similar things, the difference is that
> we map those requested blocks into objects (which is plain fixed size
> file on each node) and the sheep daemon play the role of kernel that
> doing writeback of the dirty objects and reclaim of the clean objects to
> make room to allocate objects for other requests.
> 
> QEMU <----> VM
>   ^
>   |                           push/pull objects
>   V                               |
> SD device < --- > object cache < --- >  SD replicated object storage.
>                       |
>                Sheep daemon does object push/pull and reclaim
> 
> 
> Object is implemented as fixed size file on disks, so for object cache,
> those objects are all fixed size files on the node that sheep daemon
> runs and sheep does directio on them. In this sense that we don't
> consume memory, except those objects' metadata(inode & dentry) on the node.

Does QEMU usually talk to a local sheepdog daemon?  I guess it must do
that, otherwise the cache doesn't avoid network traffic.

> >>> That is a serious abuse of the QEMU cache type variable. You now have one
> >>> setting with two completely different meanings for the same value. If you
> >>> want to control whether the sheepdog driver uses a local disk for object
> >>> cache you should have a completely separate QEMU command line setting
> >>> which can be controlled independantly of the cache= setting.
> >>>
> >>
> >> Hello Stefen and Kevin,
> >>
> >>   Should sheepdog driver use another new command setting to control its
> >> internal cache?
> >>
> >>   For network block device, which simply forward the IO requests from
> >> VMs via networking and never have chance to touch host's memory, I think
> >> it is okay to multiplex the 'cache=type', but it looks that it causes
> >> confusion for libvirt code.
> > 
> > From block/sheepdog.c:
> > 
> > /*
> >  * QEMU block layer emulates writethrough cache as 'writeback + flush', so
> >  * we always set SD_FLAG_CMD_CACHE (writeback cache) as default.
> >  */
> > s->cache_flags = SD_FLAG_CMD_CACHE;
> > if (flags & BDRV_O_NOCACHE) {
> >     s->cache_flags = SD_FLAG_CMD_DIRECT;
> > }
> > 
> > That means -drive cache=none and -drive cache=directsync use
> > SD_FLAG_CMD_DIRECT.
> > 
> > And -drive cache=writeback and cache=writethrough use SD_FLAG_CMD_CACHE.
> > 
> > This matches the behavior that QEMU uses for local files:
> > none/directsync mean O_DIRECT and writeback/writethrough go via the page
> > cache.
> > 
> > When you use NFS O_DIRECT also means bypass client-side cache.  Where is
> > the issue?
> 
> I don't have any issue on this, just Daniel complained that Sheepdog
> possible abuse the cache flags, which he thinks it should be page cache
> oriented only, if I understand correctly.

Daniel: I know of users setting cache= differently depending on local
files vs NFS.  That's because O_DIRECT isn't well-defined and has no
impact on guest I/O semantics.  It's purely a performance option that
you can choose according to your workload and host configuration - just
like Sheepdog's SD_FLAG_CMD_DIRECT.

Stefan