[Sheepdog] Cluster appears down but nodes report different epochs

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Nov 8 12:22:28 CET 2011


At Tue, 08 Nov 2011 14:06:54 +0800,
Liu Yuan wrote:
> 
> On 11/08/2011 01:33 PM, MORITA Kazutaka wrote:
> 
> > At Mon, 7 Nov 2011 10:03:19 -0500,
> > Shawn Moore wrote:
> >>
> >> When I checked on the cluster this morning I see the following from
> >> cluster info.  A sheep and corosync process was found on all nodes
> >> except blade162 which didn't have a sheep process but did have a
> >> corosync one.  I'm not sure what has happened.  We have not had a
> > 
> > In blade162.log:
> > 
> >   Nov 05 00:06:30 sd_leave_handler(1222) Network Patition Bug: I should have exited.
> > 
> > Probably, this is a corosync's bug and Yunkai is trying to solve it.
> > 
> >   http://lists.wpkg.org/pipermail/sheepdog/2011-November/001835.html
> > 
> > 
> >> network interruption that we are aware of as all nodes are on the same
> >> switch (along with countless other production systems).  Logs from
> >> each node can be found
> >> http://www.stormpoint.com/files/sd_2011-11-07.zip.  Total
> >> un-compressed size is ~ 254MB and this download size is around 21MB.
> >> When I left Friday, this is how our cluster looked:
> >>
> >> All nodes were running version 0.2.4_63_gd56e3b6
> >>
> >>    Idx - Host:Port          Vnodes       Zone
> >> ---------------------------------------------
> >>      0 - 192.168.217.152:7000 	64          1
> >>      1 - 192.168.217.153:7000 	64          1
> >>      2 - 192.168.217.154:7000 	64          1
> >>      3 - 192.168.217.155:7000 	64          1
> >>      4 - 192.168.217.156:7000 	64          1
> >>      5 - 192.168.217.157:7000 	64          2
> >>      6 - 192.168.217.159:7000 	64          2
> >>      7 - 192.168.217.160:7000 	64          2
> >>      8 - 192.168.217.161:7000 	64          2
> >>      9 - 192.168.217.162:7000 	64          2
> >>
> >> [root at blade152 sheep]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 17:26:22     14 [192.168.217.152:7000]
> >> 2011-11-04 17:26:22     13 [192.168.217.152:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:22     12 [192.168.217.152:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:22     11 [192.168.217.152:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:22     10 [192.168.217.152:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-04 17:26:21      9 [192.168.217.152:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:21      8 [192.168.217.152:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:21      7 [192.168.217.152:7000,
> >> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade153 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-05 00:05:19     14 [192.168.217.153:7000]
> >> 2011-11-05 00:05:19     13 [192.168.217.153:7000, 192.168.217.162:7000]
> >> 2011-11-05 00:05:19     12 [192.168.217.153:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-05 00:05:19     11 [192.168.217.153:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-05 00:05:19     10 [192.168.217.153:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-05 00:05:19      9 [192.168.217.153:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-05 00:05:18      8 [192.168.217.153:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-05 00:05:18      7 [192.168.217.153:7000,
> >> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade154 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 13:25:06      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 06:58:12      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 05:57:43      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-02 10:49:34      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-02 07:01:26      1 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade155 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 13:24:42      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 06:57:48      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 05:57:19      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-02 10:49:07      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 10:33:17      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-02 07:00:59      1 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade156 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-05 07:39:11      9 [192.168.217.154:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000]
> >> 2011-11-05 07:39:11      8 [192.168.217.154:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 18:47:30      7 [192.168.217.153:7000,
> >> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 10:59:30      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 09:59:03      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 09:59:03      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >>
> >>
> >> [root at blade157 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-05 07:39:11      9 [192.168.217.154:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000]
> >> 2011-11-05 07:39:11      8 [192.168.217.154:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 18:47:30      7 [192.168.217.153:7000,
> >> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 10:59:32      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 10:59:32      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-02 10:49:34      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >>
> >>
> >> [root at blade159 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 17:26:11      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 10:59:17      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 09:58:48      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-02 14:50:37      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 14:34:46      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-02 11:02:28      1 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade160 ~]# collie cluster info
> >> Cluster status: running
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-04 10:59:30      5 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 09:59:02      4 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-02 14:50:46      3 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-02 14:34:55      2 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> >> 2011-11-02 11:02:37      1 [192.168.217.152:7000,
> >> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade161 ~]# collie cluster info
> >> Cluster status: The sheepdog is stopped doing IO, short of living nodes
> >>
> >> Cluster created at Wed Nov  2 11:02:26 2011
> >>
> >> Epoch Time           Version
> >> 2011-11-04 17:26:51     14 [192.168.217.161:7000]
> >> 2011-11-04 17:26:51     13 [192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:51     12 [192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:51     11 [192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:48     10 [192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >> 2011-11-04 17:26:48      9 [192.168.217.156:7000,
> >> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> >> 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:48      8 [192.168.217.155:7000,
> >> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> >> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> >> 2011-11-04 17:26:48      7 [192.168.217.154:7000,
> >> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> >> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> >> 192.168.217.162:7000]
> >>
> >>
> >> [root at blade162 ~]# collie cluster info
> >> failed to connect to localhost:7000, Connection refused
> >> failed to connect to localhost:7000, Connection refused
> > 
> > It seems that a network partition is wrongly detected.
> > 
> > To make explanation simpler, I'll use the following labels for each
> > node:
> > 
> >     n0: 192.168.217.152
> >     n1: 192.168.217.153
> >     n2: 192.168.217.154
> >     n3: 192.168.217.155
> >     n4: 192.168.217.156
> >     n5: 192.168.217.157
> >     n6: 192.168.217.159
> >     n7: 192.168.217.160
> >     n8: 192.168.217.161
> >     n9: 192.168.217.162
> > 
> > I guess your cluster is splited into 5 groups;
> > {n0}, {n1}, {n2, n3, n4, n5, n6, n7}, {n8}, {n9}.
> > 
> >  - n0 received a notification that n[1-9] were left.
> >  - n1 received a notification that n0 and n[2-9] were left.
> >  - n[2-7] received a notification that n0, n1, n8, and n9 were left.
> >  - n8 received a notification that n[0-7] and n9 were left.
> >  - n9 received a notification that n[0-8] were left (and aborted due to the above bug).
> 
> 
> n9 bugged out, so it received a message that identified itself on leave
> list.so n9 received a notification that n[0-9].

Ah, yes, thanks.

> 
> > 
> > Currently, Sheepdog cannot handle this kinds of false detection.
> 
> This false detection is passed on to sheep from corosync. I guess this
> is triggered from corosync's built-in timeout mechanism. The corosync's
> heartbeat message might be discarded or jammed for whatever reason.
> 
> I am suspecting networking is hijacked fully by heavy IO, and no channel
> for corosync's heart-beat messages.

Probably, we need to add support for using different NICs for data
I/Os and monitoring.

Thanks,

Kazutaka

> 
> > We may avoid this problem if we set appropriate values to
> > corosync.conf (totem.merge or totem.seqno_unchanged_const?), but I'm
> > not sure.  Does anyone know more about this?
> > 
> 
> 
> I think sheep can work around this problem by use of halt feature. The
> single node ring {n0}, {n1}, {n8}, {n9} then will halt and the other
> will function as expected.
> 
> So from the log, only blade161{n8} get into halt status. other single
> nodes are still running, weird.
> 
> Shawn, did you format with -H or --nohalt option? If not, might be some
> bug in halt path.
> 
> Thanks,
> Yuan
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list