[Sheepdog] [PATCH v2] sheepdog: implement SD_OP_FLUSH_VDI operation

Liu Yuan namei.unix at gmail.com
Sat Mar 31 07:37:19 CEST 2012


On 03/31/2012 01:03 PM, MORITA Kazutaka wrote:

> At Sat, 31 Mar 2012 11:48:07 +0800,
> Liu Yuan wrote:
>>
>> On 03/31/2012 12:17 AM, MORITA Kazutaka wrote:
>>
>>> It might be better to ignore BDRV_O_NOCACHE here because:
>>>
>>>  - When writeback is enabled, we always use a cache.  And when
>>>    writeback is disabled, we don't use a cache at all.  This means
>>>    that users cannot specify whether to use a cache.
>>>
>>>  - I think qemu users expect a better performance if cache=none, which
>>>    means BDRV_O_NOCACHE | BDRV_O_CACHE_WB, is specified
>>>
>>
>>
>> I have to admit that this is my first time understanding that
>> cache=none, means a cache with DIO mode.
>>
>> So my question is what is a cache with DIO mode?
> 
> E.g. a volatile write cache of the physical disk.
> 
>>
>> I gave a gimps over the code
>>
>>     /* Use O_DSYNC for write-through caching, no flags for write-back
>> caching,
>>      * and O_DIRECT for no caching. */
>>     if ((bdrv_flags & BDRV_O_NOCACHE))
>>         s->open_flags |= O_DIRECT;
>>     if (!(bdrv_flags & BDRV_O_CACHE_WB))
>>         s->open_flags |= O_DSYNC;
>>
>> For BDRV_O_NOCACHE, it means no need of kernel's page cache. I don't
>> think there is any 'writeback' cache existing with cache=none mode, so
>> 'better performance' doesn't make sense if we have extra memory in host
>> that can be used as page cache.
> 
> O_DIRECT bypasses the page cache, but not the other ones like a disk
> write cache.  We need to add O_DSYNC to flush data completely.
> 
> On my environment, I can surely confirm it:
> 
> * benchmark with a disk write cache
> 
>   # hdparm -W 1 /dev/sdb
>   
>   /dev/sdb:
>    setting drive write-caching to 1 (on)
>    write-caching =  1 (on)
> 
>   # dd if=/dev/zero of=/dev/sdb5 bs=1M count=64 oflag=direct
>   64+0 records in
>   64+0 records out
>   67108864 bytes (67 MB) copied, 0.974981 s, 68.8 MB/s
> 
>   # dd if=/dev/zero of=/dev/sdb5 bs=1M count=64 oflag=direct,dsync
>   64+0 records in
>   64+0 records out
>   67108864 bytes (67 MB) copied, 1.62426 s, 41.3 MB/s
> 
> 
> * benchmark without a disk write cache
> 
>   # hdparm -W 0 /dev/sdb
>   
>   /dev/sdb:
>    setting drive write-caching to 0 (off)
>    write-caching =  0 (off)
> 
>   # dd if=/dev/zero of=/dev/sdb5 bs=1M count=64 oflag=direct
>   64+0 records in
>   64+0 records out
>   67108864 bytes (67 MB) copied, 2.13579 s, 31.4 MB/s
> 
>   # dd if=/dev/zero of=/dev/sdb5 bs=1M count=64 oflag=direct,dsync
>   64+0 records in
>   64+0 records out
>   67108864 bytes (67 MB) copied, 2.1628 s, 31.0 MB/s
> 
>>
>> Further more, so for users, if setting cache=none or cache=off(yes, code
>> tells me that we can pass 'off' to qemu', means our object cache is
>> enabled ! Do you ever expect this behaviour as a ordinary user ?
> 
> cache=none and cache=off only mean that QEMU doesn't use a page cache.
> It is a bit confusing but the way QEMU use it.
> 


Okay, I overlooked the disk write-cache, I didn't expect that QEMU use
option 'cache' to control all page cache, disk write-cache, sync
semantics. Kind of abuse to me.

>>
>> I don't think QEMU's cache mode is well received, especially cache=none
>> means 'DRV_O_NOCACHE | BDRV_O_CACHE_WB'. what does it mean literally?
>> Hmm, do not gimme a cache but a writeback cache please?
>>
>>>  - I guess qemu users expect that if BDRV_O_NOCACHE is set, O_DIRECT
>>>    is used for file I/Os.
>>>
>>>  - If we ignore BDRV_O_NOCACHE here, we can use qemu-iotests for
>>>    Sheepdog cache tests with the following command:
>>>
>>>      $ check -sheepdog -nocache
>>>
>>>    where -nocache means BDRV_O_NOCACHE | BDRV_O_CACHE_WB.
>>>
>>
>>
>> I am confused by 'ignoring DRV_O_NOCACHE'. Actually, I don't care qemu's
>> flags about cache mode. All I want is, a control with binary semantics
>> that can disable/enable object cache in sheepdog.
>>
>> Any better scheme?
>> currently, cache=writeback enables it, and others disables it.
> 
> My suggestion is:
>  - use a sheepdog object cache when cache=writeback or cache=none,
>    which means when BDRV_O_CACHE_WB is set
>  - don't use a object cache when cache=directsync or cache=writethrough,
>    which means BDRV_O_CACHE_WB is not set


Okay, this is a simple change, just remove the line of checking
'BDRV_O_NOCACHE'

so now cache=writeback, cache=none, and cache=off means to enable
sheepdog's object cache, other ways disable it. I hope the users won't
get confused later. Maybe we'd better explicitly write the usage down in
somewhere in the hope that users that aren't familiar with QEMU cache
option won't get confused.

Thanks,
Yuan



More information about the sheepdog mailing list