[sheepdog-users] Failing disk tests: disk not responding
Valerio Pachera
sirio81 at gmail.com
Wed Oct 16 17:21:57 CEST 2013
Here I simulate the situation of a disk that physically breaks down
and the OS can't contact it anymore.
So we are not able to clean unplug and un-mount the device.
If it was a raid software, it would sing the disk as "failed" and
continue working on the good one.
How is seepdog going to behave?
# dog node md info
Id Size Used Avail Use% Path
0 220 GB 24 GB 196 GB 10% /mnt/sheep/dsk01/obj
1 149 GB 16 GB 133 GB 10% /mnt/sheep/dsk03
# df -h | grep sheep
/dev/mapper/vg00-sheepdog 220G 24G 196G 11% /mnt/sheep/dsk01
/dev/sdc1 149G 17G 133G 11% /mnt/sheep/dsk03
# echo 1 > /sys/block/sdc/device/delete
# ls /mnt/sheep/dsk03
ls: impossibile accedere a /mnt/sheep/dsk03: Errore di input/output
# less /var/log/sheep.log
Oct 16 16:50:51 ERROR [io 4151] for_each_object_in_path(175) failed
to open /mnt/sheep/dsk03, Input/output error
Oct 16 16:58:31 ERROR [io 4151] for_each_object_in_path(175) failed
to open /mnt/sheep/dsk03, Input/output error
Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/007ab62200000020, Input/output error
Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/007ab62200000020, Input/output error
Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
Oct 16 16:58:43 ERROR [main] modify_event(156) event info for fd 25 not found
# dog node md info
Id Size Used Avail Use% Path
0 220 GB 24 GB 196 GB 10% /mnt/sheep/dsk01/obj
1 0.0 MB 0.0 MB 0.0 MB -2147483648% /mnt/sheep/dsk03
(on another node)
# dog vdi check squeeze1
22.0 % [=====================================================>
] 2.2 GB / 10 GB failed to
read 7ab62200000020 from 192.168.2.47:7000, I/O error
After some time I notice
Oct 16 17:14:07 ERROR [gway 4150] err_to_sderr(95)
oid=8036657100000000, Input/output error
Oct 16 17:14:07 ERROR [gway 4150] gateway_replication_read(268) local
read 8036657100000000 failed, Network error between sheep
Oct 16 17:14:07 INFO [main] md_remove_disk(316) /mnt/sheep/dsk03
from multi-disk array
# dog node md info
Id Size Used Avail Use% Path
0 220 GB 29 GB 191 GB 13% /mnt/sheep/dsk01/obj
My guests were not running, so I can't tell you if they were going to freeze.
I might repeat the test tomorrow.
I would like to know if there's a fixed timeout before sheep is going
the unplug the device or what else triggers it.
Sheepdog daemon version 0.7.0_144_g4f3d3e2
More information about the sheepdog-users
mailing list