[sheepdog-users] Collie node kill and md

Valerio Pachera sirio81 at gmail.com
Wed May 22 09:07:11 CEST 2013


Hi, on my production cluster I tried to kill one of the 3 nodes and
restart sheep right after.
(Sheepdog daemon version 0.5.5_335_g25a93bf)

root at sheepdog004:~# collie node list
M   Id   Host:Port         V-Nodes       Zone
-    0   192.168.6.41:7000      85  688302272
-    1   192.168.6.42:7000      85  705079488
-    2   192.168.6.44:7000      21  738633920

root at sheepdog004:~# collie node info
Id      Size    Used    Use%
 0      1.6 TB  1.0 TB   64%
 1      1.6 TB  978 GB   57%
 2      2.1 TB  236 GB   10%
Total   5.4 TB  2.2 TB   41%
Total virtual image size        1.2 TB

root at sheepdog004:~# collie node kill 2

root at sheepdog004:~# sheep -w size=20000
/mnt/wd_WCAYUEP99298,/mnt/wd_WCAYUEP99298/obj,/mnt/wd_WCAWZ1588874

root at sheepdog004:~# collie node info
Id      Size    Used    Use%
 0      1.6 TB  1.0 TB   64%
 1      1.6 TB  978 GB   57%
 2      466 GB  72 MB     0%
Total   3.7 TB  2.0 TB   53%

root at sheepdog004:~# collie node md info
Id      Size    Use     Path
 0      422 GB  0.0 MB  /mnt/wd_WCAYUEP99298/obj
 1      1.6 TB  980 MB  /mnt/wd_WCAWZ1588874

root at sheepdog004:~# collie node recovery
Nodes In Recovery:
  Id   Host:Port         V-Nodes       Zone
   2   192.168.6.44:7000      21  738633920

sheep.log
May 22 08:54:32 [main] main(752) shutdown
May 22 08:54:38 [main] md_add_disk(164) /mnt/wd_WCAYUEP99298/obj, nr 1
May 22 08:54:38 [main] md_add_disk(164) /mnt/wd_WCAWZ1588874, nr 2
May 22 08:54:38 [main] send_join_request(1082) IPv4 ip:192.168.6.44 port:7000
May 22 08:54:38 [main] check_host_env(381) WARN: Allowed open files
1024 too small, suggested 1024000
May 22 08:54:38 [main] check_host_env(390) Allowed core file size 0,
suggested unlimited
May 22 08:54:38 [main] main(745) sheepdog daemon (version
0.5.5_335_g25a93bf) started
May 22 08:54:38 [main] update_cluster_info(862) status = 1, epoch = 4,
finished: 0
May 22 08:54:40 [rw 17255] recover_object_work(205) done:0
count:60534, oid:c8d1280002992d
May 22 08:54:42 [rw 17255] recover_object_work(205) done:1
count:60534, oid:c8d1280000081f
May 22 08:54:43 [rw 17255] recover_object_work(205) done:2
count:60534, oid:c8d1280003c3d0
...
May 22 08:54:49 [gway 17253] gateway_read_obj(60) local read
80c8be4d00000000 failed, No object found
May 22 08:54:49 [gway 17253] gateway_read_obj(60) local read
80e149bf00000000 failed, No object found
May 22 08:54:49 [rw 17255] recover_object_work(205) done:19
count:60534, oid:c8d12800018e38
...
May 22 08:55:16 [gway 17253] gateway_read_obj(60) local read
80c8be4d00000000 failed, No object found
May 22 08:55:16 [gway 17253] gateway_read_obj(60) local read
80e149bf00000000 failed, No object found
May 22 08:55:16 [rw 17255] recover_object_work(205) done:109
count:60534, oid:c8d1280000ff6b
...


What do you think?
Is everything messed up?


More information about the sheepdog-users mailing list