[sheepdog-users] Collie node kill and md
Liu Yuan
namei.unix at gmail.com
Wed May 22 19:06:29 CEST 2013
On 05/22/2013 03:07 PM, Valerio Pachera wrote:
> Hi, on my production cluster I tried to kill one of the 3 nodes and
> restart sheep right after.
> (Sheepdog daemon version 0.5.5_335_g25a93bf)
>
> root at sheepdog004:~# collie node list
> M Id Host:Port V-Nodes Zone
> - 0 192.168.6.41:7000 85 688302272
> - 1 192.168.6.42:7000 85 705079488
> - 2 192.168.6.44:7000 21 738633920
>
> root at sheepdog004:~# collie node info
> Id Size Used Use%
> 0 1.6 TB 1.0 TB 64%
> 1 1.6 TB 978 GB 57%
> 2 2.1 TB 236 GB 10%
> Total 5.4 TB 2.2 TB 41%
> Total virtual image size 1.2 TB
>
> root at sheepdog004:~# collie node kill 2
>
> root at sheepdog004:~# sheep -w size=20000
> /mnt/wd_WCAYUEP99298,/mnt/wd_WCAYUEP99298/obj,/mnt/wd_WCAWZ1588874
>
> root at sheepdog004:~# collie node info
> Id Size Used Use%
> 0 1.6 TB 1.0 TB 64%
> 1 1.6 TB 978 GB 57%
> 2 466 GB 72 MB 0%
> Total 3.7 TB 2.0 TB 53%
>
Node 2 didn't show correct size, looks like a bug.
> root at sheepdog004:~# collie node md info
> Id Size Use Path
> 0 422 GB 0.0 MB /mnt/wd_WCAYUEP99298/obj
> 1 1.6 TB 980 MB /mnt/wd_WCAWZ1588874
>
> root at sheepdog004:~# collie node recovery
> Nodes In Recovery:
> Id Host:Port V-Nodes Zone
> 2 192.168.6.44:7000 21 738633920
>
> sheep.log
> May 22 08:54:32 [main] main(752) shutdown
> May 22 08:54:38 [main] md_add_disk(164) /mnt/wd_WCAYUEP99298/obj, nr 1
> May 22 08:54:38 [main] md_add_disk(164) /mnt/wd_WCAWZ1588874, nr 2
> May 22 08:54:38 [main] send_join_request(1082) IPv4 ip:192.168.6.44 port:7000
> May 22 08:54:38 [main] check_host_env(381) WARN: Allowed open files
> 1024 too small, suggested 1024000
> May 22 08:54:38 [main] check_host_env(390) Allowed core file size 0,
> suggested unlimited
> May 22 08:54:38 [main] main(745) sheepdog daemon (version
> 0.5.5_335_g25a93bf) started
> May 22 08:54:38 [main] update_cluster_info(862) status = 1, epoch = 4,
> finished: 0
> May 22 08:54:40 [rw 17255] recover_object_work(205) done:0
> count:60534, oid:c8d1280002992d
> May 22 08:54:42 [rw 17255] recover_object_work(205) done:1
> count:60534, oid:c8d1280000081f
> May 22 08:54:43 [rw 17255] recover_object_work(205) done:2
> count:60534, oid:c8d1280003c3d0
> ...
> May 22 08:54:49 [gway 17253] gateway_read_obj(60) local read
> 80c8be4d00000000 failed, No object found
> May 22 08:54:49 [gway 17253] gateway_read_obj(60) local read
> 80e149bf00000000 failed, No object found
> May 22 08:54:49 [rw 17255] recover_object_work(205) done:19
> count:60534, oid:c8d12800018e38
> ...
> May 22 08:55:16 [gway 17253] gateway_read_obj(60) local read
> 80c8be4d00000000 failed, No object found
> May 22 08:55:16 [gway 17253] gateway_read_obj(60) local read
> 80e149bf00000000 failed, No object found
> May 22 08:55:16 [rw 17255] recover_object_work(205) done:109
> count:60534, oid:c8d1280000ff6b
> ...
>
>
> What do you think?
> Is everything messed up?
>
Not yet. If you see "failed to recover object xxx", then the objects are
lost
Thanks,
Yuan
More information about the sheepdog-users
mailing list