[sheepdog] I got a“Waiting for other nodes to join cluster”

Yunkai Zhang yunkai.me at gmail.com
Mon Aug 20 04:04:48 CEST 2012


On Mon, Aug 20, 2012 at 9:57 AM, Brook <jingyu_chen at hotmail.com> wrote:
> Hi All.
>     After a failure of switcher, my sheepdog cluster can't run.
>     I got the message below,  what should i do ?
> [root at 17-IDC-D-2115 ~]# collie vdi list
>   Name        Id    Size    Used  Shared    Creation time   VDI id  Tag
> Failed to read object 8083d2b800000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 8083d46b00000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 8083d7d100000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 8083d7d200000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 8083db3700000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 8083de9d00000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 809bee7c00000000 Waiting for other nodes to join
> cluster
> Failed to read inode header
> Failed to read object 809bf02f00000000 Waiting for other nodes to join
> cluster
> ......
>
> [root at 17-IDC-D-2115 ~]# collie vdi create vol1 1G
> Failed to create VDI vol1: Waiting for other nodes to join cluster
>
> [root at 17-IDC-D-2115 ~]# corosync-cpgtool
> Group Name             PID         Node ID
> sheepdog
>                      12747      1100130496 (192.168.146.65)
>                       8228      1150462144 (192.168.146.68)
>                      15328      1133684928 (192.168.146.67)
>                       2076      1116907712 (192.168.146.66)
>
> [root at 17-IDC-D-2115 ~]# collie node list
> M   Id   Host:Port         V-Nodes       Zone
> -    0   192.168.146.65:7000    64 1100130496
> -    1   192.168.146.66:7000    64 1116907712
> -    2   192.168.146.67:7000    64 1133684928
> -    3   192.168.146.68:7000    64 1150462144
>
> [root at 17-IDC-D-2115 ~]# collie node info
> Id      Size    Used    Use%
> Cannot get information from any nodes
>
> [root at 17-IDC-D-2115 ~]# collie cluster info
> Cluster status: Waiting for other nodes to join cluster
>
> Cluster created at Mon Jul  9 16:57:18 2012
>
> Epoch Time           Version
> 2012-08-16 14:51:45     37 [192.168.146.65:7000, 192.168.146.66:7000,
> 192.168.146.67:7000, 192.168.146.68:7000, 192.168.146.69:7000]
> 2012-08-16 14:51:44     36 [192.168.146.65:7000, 192.168.146.66:7000,
> 192.168.146.67:7000, 192.168.146.68:7000, 192.168.146.69:7000,
> 192.168.146.71:7000]

The 36th version have 6 nodes: 192.168.146[65-69,71], but in the 37th
version you didn't start 192.168.146.71 node.

> ......
>
> [root at 17-IDC-D-2115 ~]# tail -n30 /data/sheepdog/sheep.log
> Aug 20 09:28:11 [main] listen_handler(819) accepted a new connection: 13
> Aug 20 09:28:11 [main] client_rx_handler(577) connection from: 13, ::1:45306
> Aug 20 09:28:11 [main] queue_request(323) 82
> Aug 20 09:28:11 [io 18] do_process_work(990) 82, 0 , 37
> Aug 20 09:28:11 [main] client_tx_handler(663) connection from: 13, ::1:45306
> Aug 20 09:28:11 [main] client_handler(764) connection seems to be dead
> Aug 20 09:28:11 [main] clear_client(703) refcnt:0, fd:13, ::1:45306
> Aug 20 09:28:11 [main] destroy_client(672) connection from: ::1:45306
> Aug 20 09:28:11 [main] listen_handler(819) accepted a new connection: 13
> Aug 20 09:28:11 [main] client_rx_handler(577) connection from: 13, ::1:45307
> Aug 20 09:28:11 [main] queue_request(323) 11
> Aug 20 09:28:11 [main] client_tx_handler(663) connection from: 13, ::1:45307
> Aug 20 09:28:11 [main] client_handler(764) connection seems to be dead
> Aug 20 09:28:11 [main] clear_client(703) refcnt:0, fd:13, ::1:45307
> Aug 20 09:28:11 [main] destroy_client(672) connection from: ::1:45307
> Aug 20 09:28:13 [main] listen_handler(819) accepted a new connection: 13
> Aug 20 09:28:13 [main] client_rx_handler(577) connection from: 13, ::1:45308
> Aug 20 09:28:13 [main] queue_request(323) 82
> Aug 20 09:28:13 [io 19] do_process_work(990) 82, 0 , 37
> Aug 20 09:28:13 [main] client_tx_handler(663) connection from: 13, ::1:45308
> Aug 20 09:28:13 [main] client_handler(764) connection seems to be dead
> Aug 20 09:28:13 [main] clear_client(703) refcnt:0, fd:13, ::1:45308
> Aug 20 09:28:13 [main] destroy_client(672) connection from: ::1:45308
> Aug 20 09:28:13 [main] listen_handler(819) accepted a new connection: 13
> Aug 20 09:28:13 [main] client_rx_handler(577) connection from: 13, ::1:45309
> Aug 20 09:28:13 [main] queue_request(323) 11
> Aug 20 09:28:13 [main] client_tx_handler(663) connection from: 13, ::1:45309
> Aug 20 09:28:13 [main] client_handler(764) connection seems to be dead
> Aug 20 09:28:13 [main] clear_client(703) refcnt:0, fd:13, ::1:45309
> Aug 20 09:28:13 [main] destroy_client(672) connection from: ::1:45309
>
>
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>



-- 
Yunkai Zhang
Work at Taobao



More information about the sheepdog mailing list