[Sheepdog] Cluster appears down but nodes report different epochs

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Thu Nov 10 08:11:42 CET 2011


At Wed, 9 Nov 2011 09:34:29 -0500,
Shawn Moore wrote:
> 
> On Tue, Nov 8, 2011 at 9:44 PM, MORITA Kazutaka
> <morita.kazutaka at lab.ntt.co.jp> wrote:
> > At Tue, 8 Nov 2011 10:56:51 -0500,
> > Shawn Moore wrote:
> >>
> >> > Probably, we need to add support for using different NICs for data
> >> > I/Os and monitoring.
> >>
> >> We currently have 4 1Gb nics bonded together using mode 4
> >> (LACP/802.3ad).  On another note, I am still doing more testing, but
> >> it almost looks like the TOTAL cluster speed might be limited to 1Gb
> >> instead of up to 4Gb (I know I can't get that much in truth though,
> >> but should be higher than 1Gb).  Does anyone have any insight into
> >> this?  We are using enterprise grade switches with two cards (18Gb/s
> >> fabric).
> >
> > How did you measure the total cluster speed?  Could your disks be a
> > bottleneck?
> 
> What I have been doing is using pssh (parallel ssh) to kick off a
> script on all nodes at the same time to create a pre-allocated vdi of
> the same size.  Then obtain the time it takes for the operation to
> complete on each node and then do some math.  Below is the script and
> an example of the data obtained.  I did go ahead a re-run my tests
> with tmpfs instead of true ext4 disks.  This did yield quite a
> performance boost, but seems it actually should have been faster given
> the usage of ram for the disks.
> 
> T_START="$(date +%s)"
> collie vdi create test_$(uname -n) ${1}G -P
> T_STOP="$(date +%s)"
> expr ${T_STOP} - ${T_START}
> 
> 
> This is a run using the disks.  This created 9 3GB vdi's.  With copies
> 3, that created 27GB worth of vdi data and 81GB worth of total cluster
> data.  Total cluster speed of 1.46Gb/s.
> 152	416	sec
> 153	462	sec
> 154	394	sec
> 155	435	sec
> 156	451	sec
> 157	483	sec
> 159	459	sec
> 160	406	sec
> 161	486	sec
> VDI SIZE	3
> COPIES	3
> TOT sec	3992
> AVG sec	443.5555556
> 27	GB
> 27648	MB
> 62.33266533	MB/s
> 81	GB
> 82944	MB
> 186.997996	MB/s
> 1495.983968	Mb/s
> 1.460921844	Gb/s
> 
> 
> This is a run using tmpfs.  This created 9 3GB vdi's.  With copies 3,
> that created 27GB worth of vdi data and 81GB worth of total cluster
> data. Total cluster speed of 5.73Gb/s.

Hmm, actually the performance looks worse than what I expected.  I'll
dig into this issue after implementing the accord cluster driver.

Thanks,

Kazutaka

> 152	80	sec
> 153	119	sec
> 154	127	sec
> 155	118	sec
> 156	116	sec
> 157	142	sec
> 159	80	sec
> 160	125	sec
> 161	110	sec
> VDI SIZE	3
> COPIES	3
> TOT sec	1017
> AVG sec	113
> 27	GB
> 27648	MB
> 244.6725664	MB/s
> 81	GB
> 82944	MB
> 734.0176991	MB/s
> 5872.141593	Mb/s
> 5.734513274	Gb/s
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list