[sheepdog-users] Simulating network problems on a 2 nodes cluster

Wed Nov 13 17:06:19 CET 2013

2 node using 2 nics each (eth0, eth1).
sheep 0.7.0_197_g9f718d2, corosync 1.4.6, debian wheezy 64 bit.

1) shutdown of the switch (and power back)
- all node still in the cluster
- no recovery
- nothing in sheep.log
- corosync still alive on bot nodes

2) remove cable from eth0 of node id 1 (the nic used by corosync):
- 'dog node list' on node id 0 was still showing node id 1
- no recovery has started
- nothing reported in sheep.log
- once the cable was back, node id 1 was still in te cluster and nothing on
its sheep.log
- after some minutes I noticed it made a check
  (this is a previous recovery)
  Nov 13 15:50:30   INFO [main] recover_object_main(841) object
53941900001268 is recovered (17609/17610)
  Nov 13 15:50:30   INFO [main] recover_object_main(841) object
539419000005e1 is recovered (17610/17610)
  (this is the new one)
  Nov 13 16:07:00   INFO [main] recover_object_main(841) object
5394190000089e is recovered (1/17610)
  Nov 13 16:07:00   INFO [main] recover_object_main(841) object
539419000003d5 is recovered (2/17610)
  Nov 13 16:07:00   INFO [main] recover_object_main(841) object
687c40000000bf is recovered (3/17610)
  ...many others...

3) remove both cable form node id 1 and isert them back after ~ 10 seconds.

root at test004:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.2.44:7000        128  738371776

root at test005:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.2.44:7000        128  738371776
   1   192.168.2.45:7000        128  755148992

Sheep and corosync are alive on both nodes but they are not realy aware of
each other.
Nothing shows up in both sheep.log.
Both node are showing the right 'vdi list'.

I try to check a small vdi:

root at test004:~# dog vdi check boot_iso
ABORT: Not enough active nodes for consistency-check

(That's obvious: node id 1 can't check for other object copies if it's
alone in the cluster).

root at test005:~# dog vdi check boot_iso
100.0 %
finish check&repair boot_iso

I was expecting the check to fail the same way but it completed it
successfully.

It's like node id 1 is aware of node id 0 but not viceversa.

This is the same (or almost) situation I got with a production cluster.

I run 'cluster shutdown' on node id 1 but only its sheep daemon died.
I run 'cluster shutdown' also on node id 0 and the daemon died.
(I was expecting to be forced of using kill -9).

Now I run sheep on both nodes to restart the cluster but they are like in
split brain (aware only of them self).

root at test004:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.2.44:7000        128  738371776

root at test004:~# dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Tue Nov  5 15:35:17 2013
Epoch Time           Version
2013-11-13 16:07:00      9 [192.168.2.44:7000]
2013-11-13 15:40:18      8 [192.168.2.44:7000, 192.168.2.45:7000]

root at test005:~# dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.2.45:7000        128  755148992

root at test005:~# dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Tue Nov  5 15:35:17 2013
Epoch Time           Version
2013-11-13 16:24:09      9 [192.168.2.45:7000]
2013-11-13 15:40:18      8 [192.168.2.44:7000, 192.168.2.45:7000]

I stopped sheep and corosync on both nodes.
I went looking to /var/log/syslog.

I see many of
Nov 13 16:07:01 test004 /USR/SBIN/CRON[5369]: (root) CMD
(/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)
Nov 13 16:07:04 test004 corosync[2959]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Nov 13 16:07:04 test004 corosync[2959]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:1 left:0)
Nov 13 16:07:04 test004 corosync[2959]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Nov 13 16:07:08 test004 corosync[2959]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Nov 13 16:07:08 test004 corosync[2959]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:1 left:0)

And that may be correct because nodes left the cluster.

On node id 1 there are some messages I haven't seen before:

Nov 13 16:24:10 test005 sheep: enqueue: log area overrun, dropping message
Nov 13 16:24:10 test005 sheep: enqueue: log area overrun, dropping message
Nov 13 16:24:10 test005 sheep: enqueue: log area overrun, dropping message
...

And I see also many of

Nov 13 15:53:28 test005 corosync[2634]:   [TOTEM ] Retransmit List: 19 1a
1b 1c 1d 1e 1f 20 21 22 2d 2e 2f 30 31 32 33 34 35 36 23 24 25 26 27 28 29
2a 2b 2c
Nov 13 15:53:29 test005 corosync[2634]:   [TOTEM ] Retransmit List: 23 24
25 26 27 28 29 2a 2b 2c 19 1a 1b 1c 1d 1e 1f 20 21 22 2d 2e 2f 30 31 32 33
34 35 36
Nov 13 15:53:29 test005 corosync[2634]:   [TOTEM ] Retransmit List: 19 1a
1b 1c 1d 1e 1f 20 21 22 2d 2e 2f 30 31 32 33 34 35 36 23 24 25 26 27 28 29
2a 2b 2c
Nov 13 15:53:29 test005 corosync[2634]:   [TOTEM ] Retransmit List: 23 24
25 26 27 28 29 2a 2b 2c 19 1a 1b 1c 1d 1e 1f 20 21 22 2d 2e 2f 30 31 32 33
34 35 36

and I think it's right: I removed both its cables.

Restarting both corosync and sheep I was able to restart the cluster.

dog node list
  Id   Host:Port         V-Nodes       Zone
   0   192.168.2.44:7000        128  738371776
   1   192.168.2.45:7000        128  755148992

And a recovery has started on node id 1.

root at test004:~# dog node recovery
Nodes In Recovery:
  Id   Host:Port         V-Nodes       Zone       Progress
   1   192.168.2.45:7000     128  755148992        4.1%

I was not expecting to see a recovery starting.
In this case node id 1 recoverying, so it means node id 0 is "the good one".
I wonder if this simply depend on the order I run the sheeps.
If I was starting sheep daemon first on test005, then on test004, maybe
test004 was going to be recovered...?

Finally

root at test005:~# dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Tue Nov  5 15:35:17 2013
Epoch Time           Version
2013-11-13 16:41:28     10 [192.168.2.44:7000, 192.168.2.45:7000]
2013-11-13 16:24:09      9 [192.168.2.45:7000]
2013-11-13 15:40:18      8 [192.168.2.44:7000, 192.168.2.45:7000]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20131113/de6b62a6/attachment-0004.html>