[Sheepdog] Cluster appears down but nodes report different epochs
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Wed Nov 9 03:44:52 CET 2011
At Tue, 8 Nov 2011 10:56:51 -0500,
Shawn Moore wrote:
>
> > Probably, we need to add support for using different NICs for data
> > I/Os and monitoring.
>
> We currently have 4 1Gb nics bonded together using mode 4
> (LACP/802.3ad). On another note, I am still doing more testing, but
> it almost looks like the TOTAL cluster speed might be limited to 1Gb
> instead of up to 4Gb (I know I can't get that much in truth though,
> but should be higher than 1Gb). Does anyone have any insight into
> this? We are using enterprise grade switches with two cards (18Gb/s
> fabric).
How did you measure the total cluster speed? Could your disks be a
bottleneck?
>
>
> >> Shawn, did you format with -H or --nohalt option? If not, might be some
> >> bug in halt path.
>
> Yes, we need that option as we will have two zones with copies being
> some even number. So if we don't use -H and one zone goes offline,
> the cluster will quit serving data.
But, cluster info on blade161 says that
[root at blade161 ~]# collie cluster info
Cluster status: The sheepdog is stopped doing IO, short of living nodes
I think some bugs are in halt path.
Thanks,
Kazutaka
More information about the sheepdog
mailing list