[Sheepdog] Sheepdog 0.3.0 schedule and 0.4.0 plan

Chris Webb chris at arachsys.com
Fri Nov 25 15:13:45 CET 2011


Hi. I've just tried the new HEAD of devel, 99d7c0f327, and now the machine
still in the network after a node has been killed seem never to eliminate it
and recover:

0028# ip link set eth1 down

0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
[long hang]
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header

[...wait a minute or two...]

0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header

[...and even after ten minutes...]

0026# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 172.16.101.7:7000      64  124063916
     1 - 172.16.101.7:7001      64  124063916
     2 - 172.16.101.7:7002      64  124063916
     3 - 172.16.101.9:7000      64  157618348
     4 - 172.16.101.9:7001      64  157618348
     5 - 172.16.101.9:7002      64  157618348
     6 - 172.16.101.11:7000     64  191172780
     7 - 172.16.101.11:7001     64  191172780
     8 - 172.16.101.11:7002     64  191172780
0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header

0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header
0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header
0026# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 172.16.101.7:7000      64  124063916
     1 - 172.16.101.7:7001      64  124063916
     2 - 172.16.101.7:7002      64  124063916
     3 - 172.16.101.9:7000      64  157618348
     4 - 172.16.101.9:7001      64  157618348
     5 - 172.16.101.9:7002      64  157618348
     6 - 172.16.101.11:7000     64  191172780
     7 - 172.16.101.11:7001     64  191172780
     8 - 172.16.101.11:7002     64  191172780
0026# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 172.16.101.7:7000      64  124063916
     1 - 172.16.101.7:7001      64  124063916
     2 - 172.16.101.7:7002      64  124063916
     3 - 172.16.101.9:7000      64  157618348
     4 - 172.16.101.9:7001      64  157618348
     5 - 172.16.101.9:7002      64  157618348
     6 - 172.16.101.11:7000     64  191172780
     7 - 172.16.101.11:7001     64  191172780
     8 - 172.16.101.11:7002     64  191172780
0026# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 172.16.101.7:7000      64  124063916
     1 - 172.16.101.7:7001      64  124063916
     2 - 172.16.101.7:7002      64  124063916
     3 - 172.16.101.9:7000      64  157618348
     4 - 172.16.101.9:7001      64  157618348
     5 - 172.16.101.9:7002      64  157618348
     6 - 172.16.101.11:7000     64  191172780
     7 - 172.16.101.11:7001     64  191172780
     8 - 172.16.101.11:7002     64  191172780
0026# echo /dev/sd[abcdefghijk]1
/dev/sda1 /dev/sdb1 /dev/sdc1
0026# echo /dev/sd[a-k]1
/dev/sda1 /dev/sdb1 /dev/sdc1
0026# collie node list
   Idx - Host:Port          Vnodes       Zone
---------------------------------------------
     0 - 172.16.101.7:7000      64  124063916
     1 - 172.16.101.7:7001      64  124063916
     2 - 172.16.101.7:7002      64  124063916
     3 - 172.16.101.9:7000      64  157618348
     4 - 172.16.101.9:7001      64  157618348
     5 - 172.16.101.9:7002      64  157618348
     6 - 172.16.101.11:7000     64  191172780
     7 - 172.16.101.11:7001     64  191172780
     8 - 172.16.101.11:7002     64  191172780
0026# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to connect to 172.16.101.11:7001: No route to host
failed to connect 172.16.101.11:7001
failed to read a inode header
failed to connect to 172.16.101.11:7000: No route to host
failed to connect 172.16.101.11:7000
failed to read a inode header

I even powered off the 0028 machine to ensure I was fully isolating it, but the
cluster never recovers.

Best wishes,

Chris.



More information about the sheepdog mailing list