[sheepdog-users] Sheepdog image fills only one node

Liu Yuan namei.unix at gmail.com
Mon Jul 2 10:22:24 CEST 2012

On 07/02/2012 03:18 PM, Christoph Hellwig wrote:
> On Sat, Jun 30, 2012 at 12:53:13AM +0800, Liu Yuan wrote:
>> On 06/29/2012 11:05 PM, Stefan Priebe - Profihost AG wrote:
>>> # kvm ... -drive
>>> file=sheepdog:,if=none,id=drive-virtio0,cache=writeback,aio=native
>>> -device virtio-blk-pci,drive=drive-virtio0,id=virtio0
>> cache=writeback enables object cache. See wiki about object cache, which
>> only flush dirty bits to the cluster.
> This is just another reason why enabling the object cache by defaul is
> wrong.  In my opinion making sure it is not enabled by defauly is a high
> priority for the 0.4.0 release.
> Reasons:
>  - While qemu still defaults to cache=writethrough all management tools
>    that people actually use (most importantly libvirt) change that to
>    cache=none
>  - With cache=none the new sheepdog version will get semantics that
>    people absolutely do not expect from a distributed block storage
>    system:
> 	(1) data is not striped over different nodes for the actual
>             write, thus not getting any scale out for big streaming
> 	    writes
> 	(2) data is not written back to the cluster until a cache
> 	    flush happens, thus causing havoc with restarting a
> 	    VM on a different node when one node crashes
> That beeing said I really like the object cache for some specific
> workloads, mostly in complete read-only mode for VDI COW base images,
> and even in write mode for cloud deployments like as a replacement
> for the default openstack semantics where images get downloaded to
> a local host and exectuted there.  But to make them useful for these
> use cases the cache needs to default to off, and there needs to be a
> sheep-side configuration to enable it for each VDI.  A good reason for
> that is for example when we to enable it for the base image but not
> the overlay which isn't even possible from qemu even if we wanted to
> go through all the hoops instead of making things work out of the box.

I am not fully convinced of disabling object cache yet because we
actually implemented switch on QEMU side. If it is wrong, it is wrong at
the very beginning to code QEMU to control switch of object cache
instead of sheep. 'cache=none' is the most notorious option of QEMU,
which ignore distributed semantics and should be to blame instead of
object cache per se.

When people set cache=writeback, I think he should know what he is doing.

That being said, I am not against to turn object cache off as default,
seems that users are always spoiled and we should provide most
conservative options for them as default. I'll submit a patch to turn it


