[sheepdog-users] Corosync Vs Zookeeper Backends

Liu Yuan namei.unix at gmail.com
Fri Mar 14 15:58:35 CET 2014


On Fri, Mar 14, 2014 at 02:43:59PM +0000, Aydelott, Ryan M. wrote:
> Interesting note on the data management, I have not dug much into the internals with Sheepdog yet - but the only possible explanation would be if any metadata on file object positioning was being retrieved through the backend.

Each volume has only 4MB meta object that track the allocatin bitmap for objects
mainly for thin provision. We don't record any placement meta data at all,
instead we 100% rely on consistent hash for placement of all the objects, resulting
in astonishing load and space balance. We have run 150 nodes with 12 disks on
each node and confirmed the evenly distributed load and space. As you noticed,
we don't have meta servers.

> 
> I agree with your statements as the IO on the zookeeper nodes is very small/nowhere near the data we are pushing throughout the sheepd members.
> 
> Sheepdog is being started as follows for each test iteration:
> 
> Corosync: sheep -n -c corosync:172.21.5.0 /meta,/var/lib/sheepdog/disc0,/var/lib/sheepdog/disc1,/var/lib/sheepdog/disc2,/var/lib/sheepdog/disc3,/var/lib/sheepdog/disc4,/var/lib/sheepdog/disc5,/var/lib/sheepdog/disc6,/var/lib/sheepdog/disc7,/var/lib/sheepdog/disc8,/var/lib/sheepdog/disc9,/var/lib/sheepdog/disc10,/var/lib/sheepdog/disc11,/var/lib/sheepdog/disc12,/var/lib/sheepdog/disc13
> 
> Zookeeper: sheep -n -c zookeeper:172.21.5.161:2181,172.21.5.162:2181,172.21.5.163:2181,172.21.5.164:2181,172.21.5.165:2181 /meta /var/lib/sheepdog/disc0,/var/lib/sheepdog/disc1,/var/lib/sheepdog/disc2,/var/lib/sheepdog/disc3,/var/lib/sheepdog/disc4,/var/lib/sheepdog/disc5,/var/lib/sheepdog/disc6,/var/lib/sheepdog/disc7,/var/lib/sheepdog/disc8,/var/lib/sheepdog/disc9,/var/lib/sheepdog/disc10,/var/lib/sheepdog/disc11,/var/lib/sheepdog/disc12,/var/lib/sheepdog/disc13
> 
> The driver we wrote/use is: https://github.com/devoid/nova/tree/sheepdog-nova-support-havana
> 
> Which builds out libvirt.xml as follows: 
> 
>     <disk type="network" device="disk">
>       <driver name="qemu" cache="writethrough"/>
>       <source protocol="sheepdog" name="//172.21.5.141:7000/instance_f9dc065b-d05d-47cb-a3e6-b02049f049df_disk"/>
>       <target bus="virtio" dev="vda"/>
>     </disk>
> 

Hmm, since you don't use object cache, you might try set 'cache=none' to save
one extra internal flush operation in QEMU. But this won't make big difference.

I have no idea why you see such a huge difference with zookeeper. Wierd. Is this
result reproduciable? I guess something wrong with other confiuration or system.

P.S. I have long wanted someone who can access IB to develop native IB network
transport code for sheepdog instead of ip-over-IB.

Thanks
Yuan



More information about the sheepdog-users mailing list