[Sheepdog] create_cluster(1652) Failed to join the sheepdog group, try again

Steven Dake sdake at redhat.com
Tue Sep 14 21:28:45 CEST 2010


On 09/12/2010 08:30 AM, Narendra Prasad Madanapalli wrote:
> Hi Kazutaka,
>
> Surprisingly it works today as corosync -f won't throw any errors.
> However, ping6 throws the folowing error
>
> # ping6 fe80::21e:4cff:fe59:9936
> connect: Invalid argument
>
>

You may have a configuration problem with your interfaces not setting up 
routes properly.

Regards
-steve

> =====corosync.log
> Sep 13 02:20:58 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'):
> started and ready to provide service.
> Sep 13 02:20:58 corosync [MAIN  ] Corosync built-in features: nss rdma
> Sep 13 02:20:58 corosync [MAIN  ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'.
> Sep 13 02:20:58 corosync [TOTEM ] Initializing transport (UDP/IP).
> Sep 13 02:20:58 corosync [TOTEM ] Initializing transmit/receive
> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
> Sep 13 02:20:58 corosync [TOTEM ] The network interface
> [fe80::21e:4cff:fe59:9936] is now up.
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> extended virtual synchrony service
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> configuration service
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> cluster closed process group service v1.01
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> cluster config database access v1.01
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> profile loading service
> Sep 13 02:20:58 corosync [SERV  ] Service engine loaded: corosync
> cluster quorum service v0.1
> Sep 13 02:20:58 corosync [MAIN  ] Compatibility mode set to whitetank.
>   Using V1 and V2 of the synchronization engine.
> Sep 13 02:20:58 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Sep 13 02:20:58 corosync [MAIN  ] Completed service synchronization,
> ready to provide service.
> Sep 13 02:21:32 corosync [SERV  ] Unloading all Corosync service engines.
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> extended virtual synchrony service
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> configuration service
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> cluster closed process group service v1.01
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> cluster config database access v1.01
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> profile loading service
> Sep 13 02:21:32 corosync [SERV  ] Service engine unloaded: corosync
> cluster quorum service v0.1
> Sep 13 02:21:32 corosync [MAIN  ] Corosync Cluster Engine exiting with
> status 0 at main.c:170.
> Sep 13 02:23:27 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'):
> started and ready to provide service.
> Sep 13 02:23:27 corosync [MAIN  ] Corosync built-in features: nss rdma
> Sep 13 02:23:27 corosync [MAIN  ] Successfully read main configuration
> file '/etc/corosync/corosync.conf'
> ==============
>
>
> Thanks,
> Narendra.
>
> On Mon, Sep 6, 2010 at 1:36 PM, MORITA Kazutaka
> <morita.kazutaka at lab.ntt.co.jp>  wrote:
>> At Sat, 4 Sep 2010 11:36:39 +0530,
>> Narendra Prasad Madanapalli wrote:
>>>
>>> Hi Steve,
>>>
>>> Please find below the output of ifconfig and the contents of corosync.log
>>>
>>> ===========ifconfig
>>> [nlakn at naninf ~]$ ifconfig
>>> eth0      Link encap:Ethernet  HWaddr 00:1B:24:69:92:11
>>>            UP BROADCAST MULTICAST  MTU:1500  Metric:1
>>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>            collisions:0 txqueuelen:1000
>>>            RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>>>            Interrupt:27 Base address:0xa000
>>>
>>> lo        Link encap:Local Loopback
>>>            inet addr:127.0.0.1  Mask:255.0.0.0
>>>            inet6 addr: ::1/128 Scope:Host
>>>            UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>            RX packets:8 errors:0 dropped:0 overruns:0 frame:0
>>>            TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
>>>            collisions:0 txqueuelen:0
>>>            RX bytes:480 (480.0 b)  TX bytes:480 (480.0 b)
>>>
>>> virbr0    Link encap:Ethernet  HWaddr 22:83:14:EC:B9:66
>>>            inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
>>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>            RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>>            TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
>>>            collisions:0 txqueuelen:0
>>>            RX bytes:0 (0.0 b)  TX bytes:3925 (3.8 KiB)
>>>
>>> wlan0     Link encap:Ethernet  HWaddr 00:1E:4C:59:99:36
>>>            inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
>>>            inet6 addr: fe80::21e:4cff:fe59:9936/64 Scope:Link
>>>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>            RX packets:1671 errors:0 dropped:0 overruns:0 frame:0
>>>            TX packets:1555 errors:0 dropped:0 overruns:0 carrier:0
>>>            collisions:0 txqueuelen:1000
>>>            RX bytes:1724236 (1.6 MiB)  TX bytes:285862 (279.1 KiB)
>>> ======================================================================
>>>
>>> =====corosync.log
>>>   Sep 03 23:17:39 corosync [MAIN  ] Corosync Cluster Engine ('1.2.7'):
>>> started and ready to provide service.
>>> Sep 03 23:17:39 corosync [MAIN  ] Corosync built-in features: nss rdma
>>> Sep 03 23:17:39 corosync [MAIN  ] Successfully read main configuration
>>> file '/etc/corosync/corosync.conf'.
>>> Sep 03 23:17:39 corosync [TOTEM ] Initializing transport (UDP/IP).
>>> Sep 03 23:17:39 corosync [TOTEM ] Initializing transmit/receive
>>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>> Sep 03 23:17:39 corosync [TOTEM ] Could not set traffic priority.
>>> (Socket operation on non-socket)
>>
>> It seems that corosync couldn't create the socket for some reason.
>>
>> Can you try 'corosync -f' to run corosync in foreground?  Corosync
>> uses perror for printing some socket errors, so the output may tell us
>> the error reason.
>>
>>
>> Thanks
>>
>> Kazutaka
>>
>>
>>> Sep 03 23:17:39 corosync [TOTEM ] The network interface
>>> [fe80::21e:4cff:fe59:9936] is now up.
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> extended virtual synchrony service
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> configuration service
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> cluster closed process group service v1.01
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> cluster config database access v1.01
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> profile loading service
>>> Sep 03 23:17:39 corosync [SERV  ] Service engine loaded: corosync
>>> cluster quorum service v0.1
>>> Sep 03 23:17:39 corosync [MAIN  ] Compatibility mode set to whitetank.
>>>   Using V1 and V2 of the synchronization engine.
>>> ==================================
>>>
>>>
>>> I observed that computer is too slow to keyboard&  mouse events when
>>> corosync is started with IPv6 bindaddr.
>>>
>>>
>>> Thanks,
>>> Narendra.
>>>
>>> On Sat, Sep 4, 2010 at 2:36 AM, Steven Dake<sdake at redhat.com>  wrote:
>>>> Perhaps your ipv6 interface isn't setup properly or for some reason corosync
>>>> can't bind to it or the multicast address.  Can you attach
>>>> /var/log/cluster/corosync.log and output of ifconfig?
>>>>
>>>> Thanks
>>>> -steve
>>>>
>>>> On 09/03/2010 12:36 PM, Narendra Prasad Madanapalli wrote:
>>>>>
>>>>> Thanks Steve. It works on Fedora13 after disabling selinux/firewall. A
>>>>> similar kind of problem I encounter when corosync is started by
>>>>> specifying IPv6 addr in corosync.conf file as follows:
>>>>>
>>>>> =======corosync.conf
>>>>> compatibility: whitetank
>>>>>
>>>>> totem {
>>>>>          version: 2
>>>>>          secauth: off
>>>>>          threads: 0
>>>>>          nodeid: 1
>>>>>          interface {
>>>>>                  ringnumber: 0
>>>>>                  nodeid: 1
>>>>>                  bindnetaddr: fe80::21e:4cff:fe59:9936
>>>>>                  mcastaddr:  ff05::1
>>>>>                  mcastport: 5405
>>>>>          }
>>>>> }
>>>>>
>>>>> logging {
>>>>>          fileline: off
>>>>>          to_stderr: no
>>>>>          to_logfile: yes
>>>>>          to_syslog: yes
>>>>>          logfile: /var/log/cluster/corosync.log
>>>>>          debug: off
>>>>>          timestamp: on
>>>>>          logger_subsys {
>>>>>                  subsys: AMF
>>>>>                  debug: off
>>>>>          }
>>>>> }
>>>>>
>>>>> amf {
>>>>>          mode: disabled
>>>>>
>>>>> ===================
>>>>>
>>>>> Corosync started successfully but sheepdog throws the same 'try again'
>>>>> errors in sheepdog.log. I ensure ip6tables are stopped before starting
>>>>> shepdog. Here, I am trying to fix addr_to_str() to support for IPv6
>>>>> addresses.  I would apreciate if you can provide pointers to overcome
>>>>> this error
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Narendra.
>>>>>
>>>>> On Wed, Aug 11, 2010 at 9:47 PM, Steven Dake<sdake at redhat.com>    wrote:
>>>>>>
>>>>>> On 08/11/2010 09:10 AM, Narendra Prasad Madanapalli wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I encounter mentioned error when sheep is started.
>>>>>>>
>>>>>>> I would appreciate if someone can help me to overcome these issues.
>>>>>>>
>>>>>>> Here is the details of corosync&      sheepdog:
>>>>>>>
>>>>>>> OS Distro: FC11
>>>>>>>
>>>>>>> Corosync:
>>>>>>> corosynclib-devel-1.2.3-1.fc11.i586
>>>>>>> corosync-1.2.3-1.fc11.i586
>>>>>>> corosynclib-1.2.3-1.fc11.i586
>>>>>>>
>>>>>>
>>>>>> You may have iptables enabled which blocks corosync from executing.
>>>>>> Another
>>>>>> common problem is selinux is enabled, which only works well on newer
>>>>>> fedora
>>>>>> versions.
>>>>>>
>>>>>> Regards
>>>>>> -steve
>>>>>>
>>>>>>> Corosync log contents when it is started:
>>>>>>> Aug 11 09:29:36 corosync [MAIN  ] Corosync Cluster Engine ('1.2.3'):
>>>>>>> started and ready to provide service.
>>>>>>> Aug 11 09:29:36 corosync [MAIN  ] Corosync built-in features: nss rdma
>>>>>>> Aug 11 09:29:36 corosync [MAIN  ] Successfully read main configuration
>>>>>>> file '/etc/corosync/corosync.conf'.
>>>>>>> Aug 11 09:29:36 corosync [TOTEM ] Initializing transport (UDP/IP).
>>>>>>> Aug 11 09:29:36 corosync [TOTEM ] Initializing transmit/receive
>>>>>>> security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
>>>>>>> Aug 11 09:29:36 corosync [TOTEM ] The network interface
>>>>>>> [192.168.122.1] is now up.
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> extended virtual synchrony service
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> configuration service
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> cluster closed process group service v1.01
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> cluster config database access v1.01
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> profile loading service
>>>>>>> Aug 11 09:29:36 corosync [SERV  ] Service engine loaded: corosync
>>>>>>> cluster quorum service v0.1
>>>>>>> Aug 11 09:29:36 corosync [MAIN  ] Compatibility mode set to whitetank.
>>>>>>>   Using V1 and V2 of the synchronization engine.
>>>>>>>
>>>>>>>
>>>>>>> corosync.conf:
>>>>>>> # cat /etc/corosync/corosync.conf
>>>>>>> # Please read the corosync.conf.5 manual page
>>>>>>> compatibility: whitetank
>>>>>>>
>>>>>>> totem {
>>>>>>>         version: 2
>>>>>>>         secauth: off
>>>>>>>         threads: 0
>>>>>>>         interface {
>>>>>>>                 ringnumber: 0
>>>>>>>                 bindnetaddr: 192.168.122.1
>>>>>>>                 mcastaddr: 226.94.1.1
>>>>>>>                 mcastport: 5405
>>>>>>>         }
>>>>>>> }
>>>>>>>
>>>>>>> logging {
>>>>>>>         fileline: off
>>>>>>>         to_stderr: yes
>>>>>>>         to_logfile: yes
>>>>>>>         to_syslog: yes
>>>>>>>         logfile: /tmp/corosync.log
>>>>>>>         debug: off
>>>>>>>         timestamp: on
>>>>>>>         logger_subsys {
>>>>>>>                 subsys: AMF
>>>>>>>                 debug: off
>>>>>>>         }
>>>>>>> }
>>>>>>>
>>>>>>> amf {
>>>>>>>         mode: disabled
>>>>>>> }
>>>>>>>
>>>>>>> sheepdog.log:
>>>>>>> Aug 11 09:48:05 worker_routine(215) started this thread 60
>>>>>>> Aug 11 09:48:05 worker_routine(215) started this thread 61
>>>>>>> Aug 11 09:48:05 worker_routine(215) started this thread 62
>>>>>>> Aug 11 09:48:05 worker_routine(215) started this thread 63
>>>>>>> Aug 11 09:48:06 create_cluster(1652) Failed to join the sheepdog
>>>>>>> group, try again
>>>>>>> Aug 11 09:48:07 create_cluster(1652) Failed to join the sheepdog
>>>>>>> group, try again
>>>>>>> Aug 11 09:48:08 create_cluster(1652) Failed to join the sheepdog
>>>>>>> group, try again
>>>>>>> Aug 11 09:48:09 create_cluster(1652) Failed to join the sheepdog
>>>>>>> group, try again
>>>>>>> Aug 11 09:48:10 create_cluster(1652) Failed to join the sheepdog
>>>>>>> group, try again
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Narendra.
>>>>>>
>>>>>>
>>>>
>>>>
>>> --
>>> sheepdog mailing list
>>> sheepdog at lists.wpkg.org
>>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>




More information about the sheepdog mailing list