[sheepdog-users] Automatic disconnection of client during recovery

Valerio Pachera sirio81 at gmail.com
Thu Dec 12 17:08:44 CET 2013


Hi, I was going to check if my testing cluster was going to have problems
adding a node with a single nic (and the other using 2).
I've been upgrading to latest sheepdog version: 0.7.0_197_g9f718d2.

I removed a node (test006) by dog node kill and waited for the recovery.

When recovery was completed, I was going to re-add the node but I noticed
that 'dog node info' was showing a single node!

This is bizarre!
sheep is alive on test004 test005 test007 but only test007 is in the
cluster.
Corosync is alive on all nodes.

2013-12-12 16:15:14     35 [192.168.2.47:7000]
2013-12-12 16:14:34     34 [192.168.2.45:7000, 192.168.2.47:7000]
2013-12-12 16:06:45     33 [192.168.2.44:7000, 192.168.2.45:7000,
192.168.2.47:7000]
2013-12-11 14:38:48     32 [192.168.2.44:7000, 192.168.2.45:7000,
192.168.2.46:7000, 192.168.2.47:7000]

Unfortunately I removed all sheep.log before noticing this issue.

These are some other info I can get (test004):

/var/log/messages
Dec 12 16:01:20 test004 sheep: logger pid 6792 starting
Dec 12 16:02:24 test004 sheep: logger pid 6792 stopped
Dec 12 16:05:43 test004 sheep: logger pid 6873 starting

/var/log/syslog
Dec 12 16:01:20 test004 sheep: logger pid 6792 starting
Dec 12 16:02:01 test004 /USR/SBIN/CRON[6807]: (root) CMD
(/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)
Dec 12 16:02:24 test004 sheep: logger pid 6792 stopped
Dec 12 16:03:01 test004 /USR/SBIN/CRON[6822]: (root) CMD
(/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)
Dec 12 16:04:01 test004 /USR/SBIN/CRON[6828]: (root) CMD
(/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)
Dec 12 16:05:01 test004 /USR/SBIN/CRON[6835]: (root) CMD
(/root/script/monitor_ram.sh >> /var/log/monitor_ram.log 2>&1)
Dec 12 16:05:01 test004 /USR/SBIN/CRON[6836]: (root) CMD (command -v
debian-sa1 > /dev/null && debian-sa1 1 1)
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Unloading all Corosync
service engines.
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync extended virtual synchrony service
Dec 12 16:05:27 test004 corosync[3110]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync configuration service
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync cluster closed process group service v1.01
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync cluster config database access v1.01
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync profile loading service
Dec 12 16:05:27 test004 corosync[3110]:   [SERV  ] Service engine unloaded:
corosync cluster quorum service v0.1
Dec 12 16:05:27 test004 corosync[3110]:   [MAIN  ] Corosync Cluster Engine
exiting with status 0 at main.c:1939.
Dec 12 16:05:41 test004 corosync[6865]:   [MAIN  ] Corosync Cluster Engine
('1.4.6'): started and ready to provide service.
Dec 12 16:05:41 test004 corosync[6865]:   [MAIN  ] Corosync built-in
features:
Dec 12 16:05:41 test004 corosync[6865]:   [MAIN  ] Successfully read main
configuration file '/etc/corosync/corosync.conf'.
Dec 12 16:05:41 test004 corosync[6865]:   [TOTEM ] Initializing transport
(UDP/IP Multicast).
Dec 12 16:05:41 test004 corosync[6865]:   [TOTEM ] Initializing
transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Dec 12 16:05:41 test004 corosync[6865]:   [TOTEM ] The network interface
[192.168.2.44] is now up.
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service failed to load
'pacemaker'.
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync extended virtual synchrony service
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync configuration service
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync cluster closed process group service v1.01
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync cluster config database access v1.01
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync profile loading service
Dec 12 16:05:41 test004 corosync[6865]:   [SERV  ] Service engine loaded:
corosync cluster quorum service v0.1
Dec 12 16:05:41 test004 corosync[6865]:   [MAIN  ] Compatibility mode set
to whitetank.  Using V1 and V2 of the synchronization engine.
Dec 12 16:05:41 test004 corosync[6865]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 12 16:05:41 test004 corosync[6865]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:0 left:0)
Dec 12 16:05:41 test004 corosync[6865]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Dec 12 16:05:43 test004 sheep: logger pid 6873 starting
Dec 12 16:05:45 test004 corosync[6865]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 12 16:05:45 test004 corosync[6865]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:1 left:0)
Dec 12 16:05:45 test004 corosync[6865]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Dec 12 16:05:48 test004 corosync[6865]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 12 16:05:48 test004 corosync[6865]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:2 left:0)
Dec 12 16:05:48 test004 corosync[6865]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Dec 12 16:05:52 test004 corosync[6865]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Dec 12 16:05:52 test004 corosync[6865]:   [CPG   ] chosen downlist: sender
r(0) ip(192.168.2.44) ; members(old:3 left:0)
Dec 12 16:05:52 test004 corosync[6865]:   [MAIN  ] Completed service
synchronization, ready to provide service.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20131212/ff6405ff/attachment-0004.html>


More information about the sheepdog-users mailing list