[sheepdog] I got a"Waiting for other nodes to join cluster"

Brook jingyu_chen at hotmail.com
Mon Aug 20 04:17:53 CEST 2012


Hi:

> The 36th version have 6 nodes: 192.168.146[65-69,71], but in the 37th
> version you didn't start 192.168.146.71 node.
> 

 Thank you for reply.
    '192.168.146.71' and "192.168.146.69" are gateways, i've shut them down. now the cluster only contains 4 nodes.
      192.168.146.65:7000       192.168.146.66:7000      192.168.146.67:7000     192.168.146.68:7000
I think there should be a "The 38th version" epoch which contains 4 nodes, but for some reasons which i don't know, the nodes can't make a whole cluster.


--------------------------------------------------
From: "Yunkai Zhang" <yunkai.me at gmail.com>
Sent: Monday, August 20, 2012 10:04 AM
To: "Brook" <jingyu_chen at hotmail.com>
Cc: <sheepdog at lists.wpkg.org>
Subject: Re: [sheepdog] I got a"Waiting for other nodes to join cluster"

> On Mon, Aug 20, 2012 at 9:57 AM, Brook <jingyu_chen at hotmail.com> wrote:
>> Hi All.
>>     After a failure of switcher, my sheepdog cluster can't run.
>>     I got the message below,  what should i do ?
>> [root at 17-IDC-D-2115 ~]# collie vdi list
>>   Name        Id    Size    Used  Shared    Creation time   VDI id  Tag
>> Failed to read object 8083d2b800000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 8083d46b00000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 8083d7d100000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 8083d7d200000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 8083db3700000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 8083de9d00000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 809bee7c00000000 Waiting for other nodes to join
>> cluster
>> Failed to read inode header
>> Failed to read object 809bf02f00000000 Waiting for other nodes to join
>> cluster
>> ......
>>
>> [root at 17-IDC-D-2115 ~]# collie vdi create vol1 1G
>> Failed to create VDI vol1: Waiting for other nodes to join cluster
>>
>> [root at 17-IDC-D-2115 ~]# corosync-cpgtool
>> Group Name             PID         Node ID
>> sheepdog
>>                      12747      1100130496 (192.168.146.65)
>>                       8228      1150462144 (192.168.146.68)
>>                      15328      1133684928 (192.168.146.67)
>>                       2076      1116907712 (192.168.146.66)
>>
>> [root at 17-IDC-D-2115 ~]# collie node list
>> M   Id   Host:Port         V-Nodes       Zone
>> -    0   192.168.146.65:7000    64 1100130496
>> -    1   192.168.146.66:7000    64 1116907712
>> -    2   192.168.146.67:7000    64 1133684928
>> -    3   192.168.146.68:7000    64 1150462144
>>
>> [root at 17-IDC-D-2115 ~]# collie node info
>> Id      Size    Used    Use%
>> Cannot get information from any nodes
>>
>> [root at 17-IDC-D-2115 ~]# collie cluster info
>> Cluster status: Waiting for other nodes to join cluster
>>
>> Cluster created at Mon Jul  9 16:57:18 2012
>>
>> Epoch Time           Version
>> 2012-08-16 14:51:45     37 [192.168.146.65:7000, 192.168.146.66:7000,
>> 192.168.146.67:7000, 192.168.146.68:7000, 192.168.146.69:7000]
>> 2012-08-16 14:51:44     36 [192.168.146.65:7000, 192.168.146.66:7000,
>> 192.168.146.67:7000, 192.168.146.68:7000, 192.168.146.69:7000,
>> 192.168.146.71:7000]
> 
> The 36th version have 6 nodes: 192.168.146[65-69,71], but in the 37th
> version you didn't start 192.168.146.71 node.
> 
>> ......
>>
>> [root at 17-IDC-D-2115 ~]# tail -n30 /data/sheepdog/sheep.log
>> Aug 20 09:28:11 [main] listen_handler(819) accepted a new connection: 13
>> Aug 20 09:28:11 [main] client_rx_handler(577) connection from: 13, ::1:45306
>> Aug 20 09:28:11 [main] queue_request(323) 82
>> Aug 20 09:28:11 [io 18] do_process_work(990) 82, 0 , 37
>> Aug 20 09:28:11 [main] client_tx_handler(663) connection from: 13, ::1:45306
>> Aug 20 09:28:11 [main] client_handler(764) connection seems to be dead
>> Aug 20 09:28:11 [main] clear_client(703) refcnt:0, fd:13, ::1:45306
>> Aug 20 09:28:11 [main] destroy_client(672) connection from: ::1:45306
>> Aug 20 09:28:11 [main] listen_handler(819) accepted a new connection: 13
>> Aug 20 09:28:11 [main] client_rx_handler(577) connection from: 13, ::1:45307
>> Aug 20 09:28:11 [main] queue_request(323) 11
>> Aug 20 09:28:11 [main] client_tx_handler(663) connection from: 13, ::1:45307
>> Aug 20 09:28:11 [main] client_handler(764) connection seems to be dead
>> Aug 20 09:28:11 [main] clear_client(703) refcnt:0, fd:13, ::1:45307
>> Aug 20 09:28:11 [main] destroy_client(672) connection from: ::1:45307
>> Aug 20 09:28:13 [main] listen_handler(819) accepted a new connection: 13
>> Aug 20 09:28:13 [main] client_rx_handler(577) connection from: 13, ::1:45308
>> Aug 20 09:28:13 [main] queue_request(323) 82
>> Aug 20 09:28:13 [io 19] do_process_work(990) 82, 0 , 37
>> Aug 20 09:28:13 [main] client_tx_handler(663) connection from: 13, ::1:45308
>> Aug 20 09:28:13 [main] client_handler(764) connection seems to be dead
>> Aug 20 09:28:13 [main] clear_client(703) refcnt:0, fd:13, ::1:45308
>> Aug 20 09:28:13 [main] destroy_client(672) connection from: ::1:45308
>> Aug 20 09:28:13 [main] listen_handler(819) accepted a new connection: 13
>> Aug 20 09:28:13 [main] client_rx_handler(577) connection from: 13, ::1:45309
>> Aug 20 09:28:13 [main] queue_request(323) 11
>> Aug 20 09:28:13 [main] client_tx_handler(663) connection from: 13, ::1:45309
>> Aug 20 09:28:13 [main] client_handler(764) connection seems to be dead
>> Aug 20 09:28:13 [main] clear_client(703) refcnt:0, fd:13, ::1:45309
>> Aug 20 09:28:13 [main] destroy_client(672) connection from: ::1:45309
>>
>>
>> --
>> sheepdog mailing list
>> sheepdog at lists.wpkg.org
>> http://lists.wpkg.org/mailman/listinfo/sheepdog
>>
> 
> 
> 
> -- 
> Yunkai Zhang
> Work at Taobao
> 


More information about the sheepdog mailing list