[sheepdog-users] Corosync Vs. Zookeeper

Wed Mar 19 16:23:12 CET 2014

So I went back and looked at an earlier mail..

   <disk type="network" device="disk">
        <driver name="qemu" cache="writethrough"/>
        <source protocol="sheepdog" name="//172.21.5.141:7000/instance_f9dc065b-d05d-47cb-a3e6-b02049f049df_disk"/>
        <target bus="virtio" dev="vda"/>
      </disk>

Is the xml definition for your image the same in both cases?  Why are 
you specifying the IP address and port in the name, unless you're 
running the image on a separate machine from the sheepdog cluster?

For example, my disk def for one of my images reads:

     <disk type='network' device='disk'>
       <driver name='qemu' type='raw'/>
       <source protocol='sheepdog' name='perftest'/>
       <target dev='vda' bus='virtio'/>
     </disk>

Couple of notes I've collected.  Writethrough has been problematic, 
meaning significant write pressure resulted in Qemu segfaulting.  I've 
settled on the default, which has been perfect in production.  I'm not 
certain (can a developer confirm this?) but sheep does open a socket, 
but if Qemu is communicating over that socket rather than via ip, that 
could impact performance.

For a comparision, I've run your benchmark line on a virtual machine on 
my zookeeper/1G interconnect cluster.

# dog node md info -A

Id    Size    Used    Avail    Use%    Path

Node 0:
  0    923 GB    70 GB    852 GB      7%    /var/lib/sheepdog/obj
Node 1:
  0    928 GB    80 GB    847 GB      8%    /var/lib/sheepdog/obj
Node 2:
  0    928 GB    74 GB    854 GB      7%    /var/lib/sheepdog/obj
Node 3:
  0    695 GB    53 GB    643 GB      7%    //var/lib/sheepdog1
  1    696 GB    54 GB    642 GB      7%    //var/lib/sheepdog2
Node 4:
  0    4.6 TB    200 GB    4.4 TB      4%    /var/lib/sheepdog/obj
Node 5:
  0    928 GB    52 GB    876 GB      5%    //var/lib/sheepdog1
  1    2.0 TB    124 GB    1.9 TB      6%    //var/lib/sheepdog2

Yes, my production cluster is a mismatched conglomeration.  :)

     Run began: Wed Mar 19 11:01:16 2014

     Record Size 128 KB
     File size set to 20971520 KB
     Command line used: iozone -i0 -i1 -t 1 -r 128#k -s 20G
     Output is in Kbytes/sec
     Time Resolution = 0.000001 seconds.
     Processor cache size set to 1024 Kbytes.
     Processor cache line size set to 32 bytes.
     File stride size set to 17 * record size.
     Throughput test with 1 process
     Each process writes a 20971520 Kbyte file in 128 Kbyte records

     Children see throughput for  1 initial writers     =   43101.81 KB/sec
     Parent sees throughput for  1 initial writers     =   42271.95 KB/sec
     Min throughput per process             =   43101.81 KB/sec
     Max throughput per process             =   43101.81 KB/sec
     Avg throughput per process             =   43101.81 KB/sec
     Min xfer                     = 20971520.00 KB

     Children see throughput for  1 rewriters     =  654497.62 KB/sec
     Parent sees throughput for  1 rewriters     =   74470.76 KB/sec
     Min throughput per process             =  654497.62 KB/sec
     Max throughput per process             =  654497.62 KB/sec
     Avg throughput per process             =  654497.62 KB/sec
     Min xfer                     = 20971520.00 KB

     Children see throughput for  1 readers         =  329881.72 KB/sec
     Parent sees throughput for  1 readers         =  329876.83 KB/sec
     Min throughput per process             =  329881.72 KB/sec
     Max throughput per process             =  329881.72 KB/sec
     Avg throughput per process             =  329881.72 KB/sec
     Min xfer                     = 20971520.00 KB

     Children see throughput for 1 re-readers     =  337342.66 KB/sec
     Parent sees throughput for 1 re-readers     =  337198.12 KB/se
     Min throughput per process             =  337342.66 KB/sec
     Max throughput per process             =  337342.66 KB/sec
     Avg throughput per process             =  337342.66 KB/sec
     Min xfer                     = 20971520.00 KB

Now, my rewriters, readers, and re-readers are likely inflated thanks to 
sheepdog cache.  The node that run was on has:

root     29783  1.5  0.1 12824732 41104 ?      Sl   Mar14 108:32 sheep -c zookeeper:10.254.0.5:2181,10.254.0.6:2181,10.254.0.1:2181 -D -w directio size=100G dir=//var/lib/sheepcache -y 10.254.0.5 -p 7000 //var/lib/sheepdog

But my numbers compare favorably to your results.  Which makes me wonder 
if perhaps there's something else going on.  Are your nodes 
communicating over a 1G link for some strange reason rather than the 
ipoib part?  I'm not sure.  The numbers I have are roughly what I'd 
expect for my 1G links.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ajhobbs.vcf
Type: text/x-vcard
Size: 353 bytes
Desc: ajhobbs.vcf
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140319/a69ddf41/attachment-0005.vcf>