[sheepdog-users] Unexpeted freeze of sheep on one node

Micha Kersloot micha at kovoks.nl
Wed Nov 19 12:11:26 CET 2014


Hi,

Can you explain those br changes? They could come from starting or stopping virtual machines for example. We had some problems with dieing disks lately. No smart errors, but terrably slow and then died the next day. The thing that showed there was something wrong was a noticeble increase in the system load.

You could setup a NC on port 7000 en try to flush some data to it from another node to see if there could be something wrong in a router/switch specific to that port. 
But aside to that flushing the node and add it as a fresh node seems the most logical next step to me.

Met vriendelijke groet,

Micha Kersloot

Blijf op de hoogte en ontvang de laatste tips over Zimbra/KovoKs Contact:
http://twitter.com/kovoks

KovoKs B.V. is ingeschreven onder KvK nummer: 11033334

----- Original Message -----
> From: "Valerio Pachera" <sirio81 at gmail.com>
> To: "Lista sheepdog user" <sheepdog-users at lists.wpkg.org>
> Cc: "Alessandro Bolgia" <alessandro at extensys.it>
> Sent: Wednesday, November 19, 2014 12:03:58 PM
> Subject: Re: [sheepdog-users] Unexpeted freeze of sheep on one node

> 2014-11-19 10:44 GMT+01:00 Micha Kersloot <micha at kovoks.nl>:
>> It looks like a network problem or failing harddrive to me.
> 
> It might be but, I should see some I/O error in dmesg and there are not.
> Sheepdog should unplung the disk in such case.
> Checking the s.m.a.r.t. status of the disks, they seem fine.
> A short s.m.a.r.t. test finds no problem.
> Now I'm running also a long s.m.a.r.t. test but it takes time.
> 
> The only unusual thing I see in dmesg is this:
> ---
> [3914310.533309] br51: port 2(zscloudappLAN) received tcn bpdu
> [3914310.533313] br51: topology change detected, sending tcn bpdu
> ---
> There are many of these messages
> dmesg | grep -c 'br51: topology change detected'
> 545
> But only 3 in the kernel.log
> grep  'br51: topology change detected' /var/log/kern.log
> Nov 18 01:47:50 sheepdog004 kernel: [3914290.742630] br51: topology
> change detected, sending tcn bpdu
> Nov 18 01:48:10 sheepdog004 kernel: [3914310.533313] br51: topology
> change detected, sending tcn bpdu
> Nov 18 01:48:20 sheepdog004 kernel: [3914320.488509] br51: topology
> change detected, sending tcn bpdu
> 
> and the time is not related with the issue (23:00).
> 
> Notice that zookeeper listen on br6 and the only relation between them
> is that they use the same physical nic (eth0).
> Sheepdog complains about not connecting to the I/O nic, that is eth1.5.
> This nic, has not bridges. It's just dedicated to sheepdog I/O.
> I can ping it with no packet loss.
> Its link speed is ok:
> ethtool eth1
>        Speed: 1000Mb/s
>        Duplex: Full
> 
> I run fsck.ext4 on sheepdog devices and they are perfect.
> 
> There's one weird thing:
> on the other two nodes (the good ones), I see several Call Trace in
> /var/log/messages.
> Most of them related to swapper, few other related to sheep and qemu.
> 
> The next tentative I'm going to do is:
>  reboot node id0
>  remove the metadata of node id0 and insert i back in the cluster.
> 
> If you have any other idea or wish to know more details, let me know.
> 
> Thank you.
> --
> sheepdog-users mailing lists
> sheepdog-users at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog-users



More information about the sheepdog-users mailing list