[sheepdog-users] [corosync] Single disk getting full

Thu Aug 8 08:39:22 CEST 2013

Liu Yuan napsal(a):
> On Thu, Aug 08, 2013 at 02:23:11AM +0200, Valerio Pachera wrote:
>> Auch! This time things went bad:
>> cluster has stopped.

...

>>
>> [note: 2]
>> Aug  7 21:00:07 sheepdog004 corosync[4365]:   [TOTEM ] Retransmit
>> List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
>> 758c 758d 758e 758f 7590 7591 7592
>> Aug  7 21:00:07 sheepdog004 corosync[4365]:   [TOTEM ] Retransmit
>> List: 757f 7580 7581 7582 7583 7584 7585 7586 7587 7588 7589 758a 758b
>> 758c 758d 758e 758f 7590 7591 7592 7593 7594 7595 7596 7597 7598 7599
>> 759a 759b 759c
>> ...
> 
> Hello corosync guys, is this normal? Sheep daemon detected a network partition.
> 

Few retransmits of packets is pretty normal because of UDP. On the other
hand, what you've sent doesn't look normal. It is every time related to
networking issue. So:
- how often you get this messages?
  - Every time node starts? - Then problem is with
multicast/switch/firewall. Just make sure multicast works (you can use
omping for that).
  - After ~two minutes of running? - Maybe known problem in kernel
multicast https://bugzilla.redhat.com/show_bug.cgi?id=880035
- Isn't there any big IO/CPU load causing corosync to not to be
scheduled properly?

Regards,
  Honza