[Sheepdog] [PATCH 2/2] object cache: introduce async flush

Fri Apr 6 11:22:51 CEST 2012

On 04/05/2012 11:09 PM, MORITA Kazutaka wrote:

> At Wed, 04 Apr 2012 01:19:10 +0800,
> Liu Yuan wrote:
>>
>> On 04/04/2012 01:00 AM, MORITA Kazutaka wrote:
>>> At Mon,  2 Apr 2012 16:21:11 +0800,
>>> Liu Yuan wrote:
>>>>
>>>> From: Liu Yuan <tailai.ly at taobao.com>
>>>>
>>>> We async flush dirty object as default to achieve the best performance.
>>>> If users prefer strong consistency over performance, users can launch
>>>> sheep with -S or --sync option.
>>>>
>>>> We need async flush because:
>>>> 	1) some APP are responsive time sensitive, the writeback of dirty bits in
>>>> 	the guest will mostly hurt RT because guest need to await its completion.
>>>> 	This is a considerably long operation in the sheep cluster.
>>>> 	2) some APP are just memory and CPU intensive, has little of concern of disk
>>>> 	data. (For e.g, just use disk to store logs of APP)
>>>> 	3) People simply prefer performance over consistency.
>>>
>>> Sheepdog is a block device storage.  This kind of feature must NOT be
>>> default.  In addition, we had better show a warning about a risk of
>>> reading old data, which could cause a filesystem corruption, when
>>> users enable this feature.
>>>
>>
>> Okay, I'll submit a patch to make it optional.
>>
>> But in which way we'll risk to read stale data? Guests always try to read objects from
>> cache with cache enabled, IMO.
> 
> If the gateway node fails, the flushed data would be lost.
> 

Yes, but disk is not volatile media as memory, so failure doesn't
strictly mean data lose. When we get it rebooted as for normal case that
disk is okay, we can flush the data again.

I guess maybe we really need async flush for putting cluster to good use
currently, because

1) I noticed that sync request is mostly issued by file system's meta
data update, which is very harsh on error, that is, only one EIO will
let file system put itself to readonly. This is evidenced by our test
cluster, when cluster is being recovering, there is a high possibility
of failure of flush request, which notify guest of EIO in sync flush
mode, that resulting in large set of Guest to be set as read only.

2) Current object cache flush the object in a very coarse uint (4M),
causing flushing operation to be slower than necessary. Though this can
be mitigated later if we support finer unit flushing by adding more
complex data structure to manage the dirty data.

Being that said, I suggest making async flushing as default for now.

Thanks,
Yuan