[sheepdog-users] Unexpeted freeze of sheep on one node
Micha Kersloot
micha at kovoks.nl
Wed Nov 19 10:44:34 CET 2014
Hi,
It looks like a network problem or failing harddrive to me.
Met vriendelijke groet,
Micha Kersloot
Blijf op de hoogte en ontvang de laatste tips over Zimbra/KovoKs Contact:
http://twitter.com/kovoks
KovoKs B.V. is ingeschreven onder KvK nummer: 11033334
----- Original Message -----
> From: "Valerio Pachera" <sirio81 at gmail.com>
> To: "Lista sheepdog user" <sheepdog-users at lists.wpkg.org>
> Sent: Wednesday, November 19, 2014 10:32:03 AM
> Subject: Re: [sheepdog-users] Unexpeted freeze of sheep on one node
> Last night I inserted back node id0 (without removing metadata).
> Recovery took very long, till 8:49 of this morning.
> Once done, sheep was frozen again.
> After 10 minutes I had to kill it.
>
> On node id0 there are no useful messages (sheep.log)
> Nov 19 09:37:40 INFO [main] recover_object_main(863) object recovery
> progress 98%
> Nov 19 09:43:59 INFO [main] recover_object_main(863) object recovery
> progress 99%
> Nov 19 09:49:54 NOTICE [main] cluster_recovery_completion(703) all
> nodes are recovered, epoch 25
>
> On node id1 I see a huge amount of this messages
> Nov 19 09:58:33 ERROR [gway 8476] sockfd_cache_get_long(348) fallback
> to non-io connection
> Nov 19 09:58:33 ERROR [gway 8628] connect_to(193) failed to connect
> to 192.168.5.44:7000: Connection refused
> Nov 19 09:58:33 ERROR [gway 8630] connect_to(193) failed to connect
> to 192.168.5.44:7000: Connection refused
> Nov 19 09:58:33 ERROR [gway 6514] connect_to(193) failed to connect
> to 192.168.5.44:3333: Connection refused
> Nov 19 09:58:33 ERROR [gway 8628] connect_to(193) failed to connect
> to 192.168.5.44:7000: Connection refused
>
> Removing this 'connection refused' messages, I see repeating the
> poll-wait and 'failed to connect' till I killed the node
> grep 'Nov 19' sheep.log | grep -v 'Connection refused' | grep -v
> 'fallback to non-io connection'
> <cut>
> Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
> find requested tag
> Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
> find requested tag
> Nov 19 09:45:04 ERROR [io 7515] sheep_exec_req(1096) failed Failed to
> find requested tag
> Nov 19 09:49:54 NOTICE [main] cluster_recovery_completion(703) all
> nodes are recovered, epoch 25
> Nov 19 09:50:08 WARN [gway 8629] wait_forward_request(389) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait
> again
> Nov 19 09:50:08 WARN [gway 8628] wait_forward_request(389) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait
> again
> Nov 19 09:50:13 WARN [gway 8629] wait_forward_request(389) poll
> timeout 1, disks of some nodes or network is busy. Going to poll-wait
> again
> <cut>
> Nov 19 09:51:19 ERROR [gway 8630] connect_to(193) failed to connect
> to 192.168.5.44:3333: Operation now in progress
>
> I don't understand what wrong with this node.
> --
> sheepdog-users mailing lists
> sheepdog-users at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog-users
More information about the sheepdog-users
mailing list