[sheepdog] Questions on the virtual disk's cache type
Liu Yuan
namei.unix at gmail.com
Wed Jan 23 15:29:09 CET 2013
On 01/23/2013 10:15 PM, Stefan Hajnoczi wrote:
> On Wed, Jan 23, 2013 at 09:15:47PM +0800, Liu Yuan wrote:
>> > On 01/23/2013 08:34 PM, Stefan Hajnoczi wrote:
>>> > > On Wed, Jan 23, 2013 at 06:47:55PM +0800, Liu Yuan wrote:
>>>> > >> On 01/23/2013 06:14 PM, Daniel P. Berrange wrote:
>>>>> > >>> On Wed, Jan 23, 2013 at 06:09:01PM +0800, Liu Yuan wrote:
>>>>>> > >>>> On 01/23/2013 05:30 PM, Daniel P. Berrange wrote:
>>>>>>> > >>>>> FYI There is a patch proposed for customization
>>>>>>> > >>>>>
>>>>>>> > >>>>> https://review.openstack.org/#/c/18042/
>>>>>>> > >>>>>
>>>>>> > >>>>
>>>>>> > >>>> Seems that this patch is dropped and declined?
>>>>>> > >>>>
>>>>>>> > >>>>>
>>>>>>> > >>>>> I should note that it is wrong to assume that enabling cache mode will
>>>>>>> > >>>>> improve the performance in general. Allowing caching in the host will
>>>>>>> > >>>>> require a non-negligable amount of host RAM to have a benefit. RAM is
>>>>>>> > >>>>> usually the most constrained resource in any virtualization environment.
>>>>>>> > >>>>> So while the cache may help performance when only one or two Vms are
>>>>>>> > >>>>> running on the host, it may well in fact hurt performance once the host
>>>>>>> > >>>>> is running enough VMs to max out RAM. So allowing caching will actually
>>>>>>> > >>>>> give you quite variable performance, while the cache=none will give you
>>>>>>> > >>>>> consistent performance regardless of host RAM utilization (underlying
>>>>>>> > >>>>> contention of the storage device may of course still impact things).
>>>>>> > >>>>
>>>>>> > >>>> Yeah, allowing page cache in the host might not be a good idea to run
>>>>>> > >>>> multiple VMs, but cache type in QEMU has different meaning for network
>>>>>> > >>>> block devices. For e.g, we use 'cache type' to control client side cache
>>>>>> > >>>> of Sheepdog cluster, which implement a object cache in the local disk
>>>>>> > >>>> for performance boost and reducing network traffics. This doesn't
>>>>>> > >>>> consume memory at all, just occupy the disk space where runs sheep daemon.
>>> > >
>>> > > How can it be a "client-side cache" if it doesn't consume memory on the
>>> > > client?
>>> > >
>>> > > Please explain how the "client-side cache" feature works. I'm not
>>> > > familiar with sheepdog internals.
>>> > >
>> >
>> > Let me start with local file as backend of block device of QEMU. It
>> > basically uses host memory pages to cache blocks of emulated device.
>> > Kernel internally maps those blocks into pages of file (A.K.A page
>> > cache) and then we relies on the kernel memory subsystem to do writeback
>> > of those cached pages. When VM read/write some blocks, kernel allocate
>> > pages on demand to serve the read/write requests operated on the pages.
>> >
>> > QEMU <----> VM
>> > ^
>> > | writeback/readahead pages
>> > V |
>> > POSIX file < --- > page cache < --- > disk
>> > |
>> > kernel does page wb/ra and reclaim
>> >
>> > Object cache of Sheepdog do the similar things, the difference is that
>> > we map those requested blocks into objects (which is plain fixed size
>> > file on each node) and the sheep daemon play the role of kernel that
>> > doing writeback of the dirty objects and reclaim of the clean objects to
>> > make room to allocate objects for other requests.
>> >
>> > QEMU <----> VM
>> > ^
>> > | push/pull objects
>> > V |
>> > SD device < --- > object cache < --- > SD replicated object storage.
>> > |
>> > Sheep daemon does object push/pull and reclaim
>> >
>> >
>> > Object is implemented as fixed size file on disks, so for object cache,
>> > those objects are all fixed size files on the node that sheep daemon
>> > runs and sheep does directio on them. In this sense that we don't
>> > consume memory, except those objects' metadata(inode & dentry) on the node.
> Does QEMU usually talk to a local sheepdog daemon? I guess it must do
> that, otherwise the cache doesn't avoid network traffic.
>
Yeah, mostly on the same node, but there are users separating VM from
sheep daemons on different nodes too.
Even with QEMU and sheep daemon set on different nodes, object cache can
also help reduce the network traffic, because without object cache, a
single write request will be replicated to N(copies) nodes, while with
object cache, the write request will be handled only on one node and
returns after the operation on the hit object. (like page cache hit).
So directio in Sheepdog means that bypass object cache and do the normal
request handling that is replicated to all the replica nodes. I think
this also conform to the page cache directio semantics which bypass page
cache.
Thanks,
Yuan
More information about the sheepdog
mailing list