Hi. I've just tried the new HEAD of devel, 99d7c0f327, and now the machine still in the network after a node has been killed seem never to eliminate it and recover: 0028# ip link set eth1 down 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ [long hang] failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header [...wait a minute or two...] 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header [...and even after ten minutes...] 0026# collie node list Idx - Host:Port Vnodes Zone --------------------------------------------- 0 - 172.16.101.7:7000 64 124063916 1 - 172.16.101.7:7001 64 124063916 2 - 172.16.101.7:7002 64 124063916 3 - 172.16.101.9:7000 64 157618348 4 - 172.16.101.9:7001 64 157618348 5 - 172.16.101.9:7002 64 157618348 6 - 172.16.101.11:7000 64 191172780 7 - 172.16.101.11:7001 64 191172780 8 - 172.16.101.11:7002 64 191172780 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header 0026# collie node list Idx - Host:Port Vnodes Zone --------------------------------------------- 0 - 172.16.101.7:7000 64 124063916 1 - 172.16.101.7:7001 64 124063916 2 - 172.16.101.7:7002 64 124063916 3 - 172.16.101.9:7000 64 157618348 4 - 172.16.101.9:7001 64 157618348 5 - 172.16.101.9:7002 64 157618348 6 - 172.16.101.11:7000 64 191172780 7 - 172.16.101.11:7001 64 191172780 8 - 172.16.101.11:7002 64 191172780 0026# collie node list Idx - Host:Port Vnodes Zone --------------------------------------------- 0 - 172.16.101.7:7000 64 124063916 1 - 172.16.101.7:7001 64 124063916 2 - 172.16.101.7:7002 64 124063916 3 - 172.16.101.9:7000 64 157618348 4 - 172.16.101.9:7001 64 157618348 5 - 172.16.101.9:7002 64 157618348 6 - 172.16.101.11:7000 64 191172780 7 - 172.16.101.11:7001 64 191172780 8 - 172.16.101.11:7002 64 191172780 0026# collie node list Idx - Host:Port Vnodes Zone --------------------------------------------- 0 - 172.16.101.7:7000 64 124063916 1 - 172.16.101.7:7001 64 124063916 2 - 172.16.101.7:7002 64 124063916 3 - 172.16.101.9:7000 64 157618348 4 - 172.16.101.9:7001 64 157618348 5 - 172.16.101.9:7002 64 157618348 6 - 172.16.101.11:7000 64 191172780 7 - 172.16.101.11:7001 64 191172780 8 - 172.16.101.11:7002 64 191172780 0026# echo /dev/sd[abcdefghijk]1 /dev/sda1 /dev/sdb1 /dev/sdc1 0026# echo /dev/sd[a-k]1 /dev/sda1 /dev/sdb1 /dev/sdc1 0026# collie node list Idx - Host:Port Vnodes Zone --------------------------------------------- 0 - 172.16.101.7:7000 64 124063916 1 - 172.16.101.7:7001 64 124063916 2 - 172.16.101.7:7002 64 124063916 3 - 172.16.101.9:7000 64 157618348 4 - 172.16.101.9:7001 64 157618348 5 - 172.16.101.9:7002 64 157618348 6 - 172.16.101.11:7000 64 191172780 7 - 172.16.101.11:7001 64 191172780 8 - 172.16.101.11:7002 64 191172780 0026# collie vdi list name id size used shared creation time vdi id ------------------------------------------------------------------ failed to connect to 172.16.101.11:7001: No route to host failed to connect 172.16.101.11:7001 failed to read a inode header failed to connect to 172.16.101.11:7000: No route to host failed to connect 172.16.101.11:7000 failed to read a inode header I even powered off the 0028 machine to ensure I was fully isolating it, but the cluster never recovers. Best wishes, Chris. |