[sheepdog-users] Concern about sheepdog performance

Liu Yuan namei.unix at gmail.com
Tue Dec 18 14:25:32 CET 2012


On 12/18/2012 08:51 PM, Valerio Pachera wrote:
> 0.5.5_6_gb3f888b
> 
> journal is on a separated device (sdb) on all three nodes.
> 
> WITHOUT CACHE AND JOURNAL
> - /mnt/sheepdog and /mnt/sdb1/sdj are using xfs with
> 'noatime,barrier=0' mount options.
> - sheep -j dir=/mnt/sdb1/sdj,size=512 -w object:size=20000 /mnt/sheepdog/
> - kvm -drive file=sheepdog:test
> - on the guest: dd if=/dev/zero of=/mnt/sda1 bs=1M count=512
>   8.9M/s
> 
> WITH CACHE AND JOURNAL
> - -drive file=sheepdog:test,if=virtio,cache=writeback
> - on the guest: dd if=/dev/zero of=/mnt/vda1 bs=1M count=512
>   190M/s
> 
> WITH CACHE AND JOURNAL WRITING FILE TWICE
> - on the guest: dd if=/dev/zero of=/mnt/vda1 bs=1M count=512
>   242M/s
> 
> TIME USED BY SYNC
> Correct me if I'm wrong:
>   dd report a 'fake' speed because the is not yet written on disks,
> but it's on cache.
> 

Well, object cache is indeed *on* disks persistently. It is located on
the node that holds your VMs. Simply put, with object cache enabled,
write returns when it reaches to local node disk. Without object cache
enabled, write returns *only* when all the copies of this object reaches
to the individual nodes via network. So

[assume you have 3 copies]
writeback with object cache: single write on local node.
no object cache: three writes + transfer of objects between nodes

This is why we get a noticeable speedup by sacrifice the data
reliability of a small window since we only store one copy before we get
'sync' request. For object cache, 'sync' request will flush dirty data
to the cluster (after 'sync', which is guaranteed by kernel issued in 30
seconds periodically even use don't explicitly issue it at userspace, we
will have 3 copies uptodate).

Well, object cache is more flexible. It supports writethrough mode,
which don't sacrifice reliability at all and will benefit the read a
lot. For 'writethrough', it means we write 3 copies through the cache to
the cluster and also keep an uptodate copy at local node.

> To know fast data is synced from cache to disks I used this:
>   sync; time (dd if=/dev/zero of=/mnt/vda1 bs=1M count=512; sync)
> This way I know the elapsed time between dd starts and sync is done
> writing data.
> 
> WITH CACHE AND JOURNAL
>   26.8 s -> 512/26.8 = 19,1M/s
> 
> I tried also to remove 'noatime,barrier=0' mount options, and it seems
> there's no difference.
> 
> 
> WITH CACHE AND JOURNAL WRITING FILE TWICE
>   25,3 s -> 512/25.3 = 20,2M/s
> 

How about use journaling mode only(no object cache)? I guess it will be
a bit more than 20m/s.

Thanks,
Yuan



More information about the sheepdog-users mailing list