[sheepdog] Questions on the virtual disk's cache type

Wed Jan 23 14:15:47 CET 2013

On 01/23/2013 08:34 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 23, 2013 at 06:47:55PM +0800, Liu Yuan wrote:
>> On 01/23/2013 06:14 PM, Daniel P. Berrange wrote:
>>> On Wed, Jan 23, 2013 at 06:09:01PM +0800, Liu Yuan wrote:
>>>> On 01/23/2013 05:30 PM, Daniel P. Berrange wrote:
>>>>> FYI There is a patch proposed for customization
>>>>>
>>>>>   https://review.openstack.org/#/c/18042/
>>>>>
>>>>
>>>> Seems that this patch is dropped and declined?
>>>>
>>>>>
>>>>> I should note that it is wrong to assume that enabling cache mode will
>>>>> improve the performance in general. Allowing caching in the host will
>>>>> require a non-negligable amount of host RAM to have a benefit. RAM is
>>>>> usually the most constrained resource in any virtualization environment.
>>>>> So while the cache may help performance when only one or two Vms are
>>>>> running on the host, it may well in fact hurt performance once the host
>>>>> is running enough VMs to max out RAM. So allowing caching will actually
>>>>> give you quite variable performance, while the cache=none will give you
>>>>> consistent performance regardless of host RAM utilization (underlying
>>>>> contention of the storage device may of course still impact things).
>>>>
>>>> Yeah, allowing page cache in the host might not be a good idea to run
>>>> multiple VMs, but cache type in QEMU has different meaning for network
>>>> block devices. For e.g, we use 'cache type' to control client side cache
>>>> of Sheepdog cluster, which implement a object cache in the local disk
>>>> for performance boost and reducing network traffics. This doesn't
>>>> consume memory at all, just occupy the disk space where runs sheep daemon.
> 
> How can it be a "client-side cache" if it doesn't consume memory on the
> client?
> 
> Please explain how the "client-side cache" feature works.  I'm not
> familiar with sheepdog internals.
> 

Let me start with local file as backend of block device of QEMU. It
basically uses host memory pages to cache blocks of emulated device.
Kernel internally maps those blocks into pages of file (A.K.A page
cache) and then we relies on the kernel memory subsystem to do writeback
of those cached pages. When VM read/write some blocks, kernel allocate
pages on demand to serve the read/write requests operated on the pages.

QEMU <----> VM
  ^
  |                       writeback/readahead pages
  V                              |
POSIX file < --- > page cache < --- > disk
                      |
        kernel does page wb/ra and reclaim

Object cache of Sheepdog do the similar things, the difference is that
we map those requested blocks into objects (which is plain fixed size
file on each node) and the sheep daemon play the role of kernel that
doing writeback of the dirty objects and reclaim of the clean objects to
make room to allocate objects for other requests.

QEMU <----> VM
  ^
  |                           push/pull objects
  V                               |
SD device < --- > object cache < --- >  SD replicated object storage.
                      |
               Sheep daemon does object push/pull and reclaim

Object is implemented as fixed size file on disks, so for object cache,
those objects are all fixed size files on the node that sheep daemon
runs and sheep does directio on them. In this sense that we don't
consume memory, except those objects' metadata(inode & dentry) on the node.

>>> That is a serious abuse of the QEMU cache type variable. You now have one
>>> setting with two completely different meanings for the same value. If you
>>> want to control whether the sheepdog driver uses a local disk for object
>>> cache you should have a completely separate QEMU command line setting
>>> which can be controlled independantly of the cache= setting.
>>>
>>
>> Hello Stefen and Kevin,
>>
>>   Should sheepdog driver use another new command setting to control its
>> internal cache?
>>
>>   For network block device, which simply forward the IO requests from
>> VMs via networking and never have chance to touch host's memory, I think
>> it is okay to multiplex the 'cache=type', but it looks that it causes
>> confusion for libvirt code.
> 
> From block/sheepdog.c:
> 
> /*
>  * QEMU block layer emulates writethrough cache as 'writeback + flush', so
>  * we always set SD_FLAG_CMD_CACHE (writeback cache) as default.
>  */
> s->cache_flags = SD_FLAG_CMD_CACHE;
> if (flags & BDRV_O_NOCACHE) {
>     s->cache_flags = SD_FLAG_CMD_DIRECT;
> }
> 
> That means -drive cache=none and -drive cache=directsync use
> SD_FLAG_CMD_DIRECT.
> 
> And -drive cache=writeback and cache=writethrough use SD_FLAG_CMD_CACHE.
> 
> This matches the behavior that QEMU uses for local files:
> none/directsync mean O_DIRECT and writeback/writethrough go via the page
> cache.
> 
> When you use NFS O_DIRECT also means bypass client-side cache.  Where is
> the issue?

I don't have any issue on this, just Daniel complained that Sheepdog
possible abuse the cache flags, which he thinks it should be page cache
oriented only, if I understand correctly.

Thanks,
Yuan