[Sheepdog] Cluster appears down but nodes report different epochs

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Nov 8 06:33:39 CET 2011


At Mon, 7 Nov 2011 10:03:19 -0500,
Shawn Moore wrote:
> 
> When I checked on the cluster this morning I see the following from
> cluster info.  A sheep and corosync process was found on all nodes
> except blade162 which didn't have a sheep process but did have a
> corosync one.  I'm not sure what has happened.  We have not had a

In blade162.log:

  Nov 05 00:06:30 sd_leave_handler(1222) Network Patition Bug: I should have exited.

Probably, this is a corosync's bug and Yunkai is trying to solve it.

  http://lists.wpkg.org/pipermail/sheepdog/2011-November/001835.html


> network interruption that we are aware of as all nodes are on the same
> switch (along with countless other production systems).  Logs from
> each node can be found
> http://www.stormpoint.com/files/sd_2011-11-07.zip.  Total
> un-compressed size is ~ 254MB and this download size is around 21MB.
> When I left Friday, this is how our cluster looked:
> 
> All nodes were running version 0.2.4_63_gd56e3b6
> 
>    Idx - Host:Port          Vnodes       Zone
> ---------------------------------------------
>      0 - 192.168.217.152:7000 	64          1
>      1 - 192.168.217.153:7000 	64          1
>      2 - 192.168.217.154:7000 	64          1
>      3 - 192.168.217.155:7000 	64          1
>      4 - 192.168.217.156:7000 	64          1
>      5 - 192.168.217.157:7000 	64          2
>      6 - 192.168.217.159:7000 	64          2
>      7 - 192.168.217.160:7000 	64          2
>      8 - 192.168.217.161:7000 	64          2
>      9 - 192.168.217.162:7000 	64          2
> 
> [root at blade152 sheep]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 17:26:22     14 [192.168.217.152:7000]
> 2011-11-04 17:26:22     13 [192.168.217.152:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:22     12 [192.168.217.152:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:22     11 [192.168.217.152:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:22     10 [192.168.217.152:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 2011-11-04 17:26:21      9 [192.168.217.152:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:21      8 [192.168.217.152:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:21      7 [192.168.217.152:7000,
> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade153 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-05 00:05:19     14 [192.168.217.153:7000]
> 2011-11-05 00:05:19     13 [192.168.217.153:7000, 192.168.217.162:7000]
> 2011-11-05 00:05:19     12 [192.168.217.153:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-05 00:05:19     11 [192.168.217.153:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-05 00:05:19     10 [192.168.217.153:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 2011-11-05 00:05:19      9 [192.168.217.153:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-05 00:05:18      8 [192.168.217.153:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-05 00:05:18      7 [192.168.217.153:7000,
> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade154 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 13:25:06      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 06:58:12      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 05:57:43      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-02 10:49:34      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-02 07:01:26      1 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade155 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 13:24:42      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 06:57:48      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 05:57:19      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-02 10:49:07      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 10:33:17      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-02 07:00:59      1 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade156 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-05 07:39:11      9 [192.168.217.154:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000]
> 2011-11-05 07:39:11      8 [192.168.217.154:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 18:47:30      7 [192.168.217.153:7000,
> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 10:59:30      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 09:59:03      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 09:59:03      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 
> 
> [root at blade157 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-05 07:39:11      9 [192.168.217.154:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000]
> 2011-11-05 07:39:11      8 [192.168.217.154:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 18:47:30      7 [192.168.217.153:7000,
> 192.168.217.154:7000, 192.168.217.155:7000, 192.168.217.156:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.162:7000]
> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 10:59:32      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 10:59:32      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-02 10:49:34      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 10:33:44      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 
> 
> [root at blade159 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 17:26:11      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 10:59:17      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 09:58:48      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-02 14:50:37      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 14:34:46      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-02 11:02:28      1 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade160 ~]# collie cluster info
> Cluster status: running
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 17:26:26      6 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-04 10:59:30      5 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 09:59:02      4 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-02 14:50:46      3 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-02 14:34:55      2 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.162:7000]
> 2011-11-02 11:02:37      1 [192.168.217.152:7000,
> 192.168.217.153:7000, 192.168.217.154:7000, 192.168.217.155:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade161 ~]# collie cluster info
> Cluster status: The sheepdog is stopped doing IO, short of living nodes
> 
> Cluster created at Wed Nov  2 11:02:26 2011
> 
> Epoch Time           Version
> 2011-11-04 17:26:51     14 [192.168.217.161:7000]
> 2011-11-04 17:26:51     13 [192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:51     12 [192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:51     11 [192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:48     10 [192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 2011-11-04 17:26:48      9 [192.168.217.156:7000,
> 192.168.217.157:7000, 192.168.217.159:7000, 192.168.217.160:7000,
> 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:48      8 [192.168.217.155:7000,
> 192.168.217.156:7000, 192.168.217.157:7000, 192.168.217.159:7000,
> 192.168.217.160:7000, 192.168.217.161:7000, 192.168.217.162:7000]
> 2011-11-04 17:26:48      7 [192.168.217.154:7000,
> 192.168.217.155:7000, 192.168.217.156:7000, 192.168.217.157:7000,
> 192.168.217.159:7000, 192.168.217.160:7000, 192.168.217.161:7000,
> 192.168.217.162:7000]
> 
> 
> [root at blade162 ~]# collie cluster info
> failed to connect to localhost:7000, Connection refused
> failed to connect to localhost:7000, Connection refused

It seems that a network partition is wrongly detected.

To make explanation simpler, I'll use the following labels for each
node:

    n0: 192.168.217.152
    n1: 192.168.217.153
    n2: 192.168.217.154
    n3: 192.168.217.155
    n4: 192.168.217.156
    n5: 192.168.217.157
    n6: 192.168.217.159
    n7: 192.168.217.160
    n8: 192.168.217.161
    n9: 192.168.217.162

I guess your cluster is splited into 5 groups;
{n0}, {n1}, {n2, n3, n4, n5, n6, n7}, {n8}, {n9}.

 - n0 received a notification that n[1-9] were left.
 - n1 received a notification that n0 and n[2-9] were left.
 - n[2-7] received a notification that n0, n1, n8, and n9 were left.
 - n8 received a notification that n[0-7] and n9 were left.
 - n9 received a notification that n[0-8] were left (and aborted due to the above bug).

Currently, Sheepdog cannot handle this kinds of false detection.

We may avoid this problem if we set appropriate values to
corosync.conf (totem.merge or totem.seqno_unchanged_const?), but I'm
not sure.  Does anyone know more about this?


Thanks,

Kazutaka



More information about the sheepdog mailing list