[sheepdog-users] Unexpeted freeze of sheep on one node
Valerio Pachera
sirio81 at gmail.com
Wed Nov 19 10:32:03 CET 2014
Last night I inserted back node id0 (without removing metadata).
Recovery took very long, till 8:49 of this morning.
Once done, sheep was frozen again.
After 10 minutes I had to kill it.
On node id0 there are no useful messages (sheep.log)
Nov 19 09:37:40 INFO [main] recover_object_main(863) object recovery
progress 98%
Nov 19 09:43:59 INFO [main] recover_object_main(863) object recovery
progress 99%
Nov 19 09:49:54 NOTICE [main] cluster_recovery_completion(703) all
nodes are recovered, epoch 25
On node id1 I see a huge amount of this messages
Nov 19 09:58:33 ERROR [gway 8476] sockfd_cache_get_long(348) fallback
to non-io connection
Nov 19 09:58:33 ERROR [gway 8628] connect_to(193) failed to connect
to 192.168.5.44:7000: Connection refused
Nov 19 09:58:33 ERROR [gway 8630] connect_to(193) failed to connect
to 192.168.5.44:7000: Connection refused
Nov 19 09:58:33 ERROR [gway 6514] connect_to(193) failed to connect
to 192.168.5.44:3333: Connection refused
Nov 19 09:58:33 ERROR [gway 8628] connect_to(193) failed to connect
to 192.168.5.44:7000: Connection refused
Removing this 'connection refused' messages, I see repeating the
poll-wait and 'failed to connect' till I killed the node
grep 'Nov 19' sheep.log | grep -v 'Connection refused' | grep -v
'fallback to non-io connection'
<cut>
Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
find requested tag
Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
find requested tag
Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
find requested tag
Nov 19 09:49:54 NOTICE [main] cluster_recovery_completion(703) all
nodes are recovered, epoch 25
Nov 19 09:50:08 WARN [gway 8629] wait_forward_request(389) poll
timeout 1, disks of some nodes or network is busy. Going to poll-wait
again
Nov 19 09:50:08 WARN [gway 8628] wait_forward_request(389) poll
timeout 1, disks of some nodes or network is busy. Going to poll-wait
again
Nov 19 09:50:13 WARN [gway 8629] wait_forward_request(389) poll
timeout 1, disks of some nodes or network is busy. Going to poll-wait
again
<cut>
Nov 19 09:51:19 ERROR [gway 8630] connect_to(193) failed to connect
to 192.168.5.44:3333: Operation now in progress
I don't understand what wrong with this node.
More information about the sheepdog-users
mailing list