[sheepdog-users] Failing disk tests: disk not responding

Valerio Pachera sirio81 at gmail.com
Wed Oct 16 17:21:57 CEST 2013


Here I simulate the situation of a disk that physically breaks down
and the OS can't contact it anymore.
So we are not able to clean unplug and un-mount the device.

If it was a raid software, it would sing the disk as "failed" and
continue working on the good one.

How is seepdog going to behave?

# dog node md info
Id      Size    Used    Avail   Use%    Path
 0      220 GB  24 GB   196 GB   10%    /mnt/sheep/dsk01/obj
 1      149 GB  16 GB   133 GB   10%    /mnt/sheep/dsk03

# df -h | grep sheep
/dev/mapper/vg00-sheepdog  220G   24G    196G  11% /mnt/sheep/dsk01
/dev/sdc1                  149G   17G    133G  11% /mnt/sheep/dsk03

# echo 1 > /sys/block/sdc/device/delete

# ls /mnt/sheep/dsk03
ls: impossibile accedere a /mnt/sheep/dsk03: Errore di input/output

# less /var/log/sheep.log
Oct 16 16:50:51  ERROR [io 4151] for_each_object_in_path(175) failed
to open /mnt/sheep/dsk03, Input/output error
Oct 16 16:58:31  ERROR [io 4151] for_each_object_in_path(175) failed
to open /mnt/sheep/dsk03, Input/output error
Oct 16 16:58:43  ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/007ab62200000020, Input/output error
Oct 16 16:58:43  ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/007ab62200000020, Input/output error
Oct 16 16:58:43  ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
Oct 16 16:58:43  ERROR [io 4151] md_access(457) failed to check
/mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
Oct 16 16:58:43  ERROR [main] modify_event(156) event info for fd 25 not found


# dog node md info
Id      Size    Used    Avail   Use%    Path
 0      220 GB  24 GB   196 GB   10%    /mnt/sheep/dsk01/obj
 1      0.0 MB  0.0 MB  0.0 MB  -2147483648%    /mnt/sheep/dsk03


(on another node)
# dog vdi check squeeze1
 22.0 % [=====================================================>


                                     ] 2.2 GB / 10 GB     failed to
read 7ab62200000020 from 192.168.2.47:7000, I/O error



After some time I notice

Oct 16 17:14:07  ERROR [gway 4150] err_to_sderr(95)
oid=8036657100000000, Input/output error
Oct 16 17:14:07  ERROR [gway 4150] gateway_replication_read(268) local
read 8036657100000000 failed, Network error between sheep
Oct 16 17:14:07   INFO [main] md_remove_disk(316) /mnt/sheep/dsk03
from multi-disk array

# dog node md info
Id      Size    Used    Avail   Use%    Path
 0      220 GB  29 GB   191 GB   13%    /mnt/sheep/dsk01/obj


My guests were not running, so I can't tell you if they were going to freeze.
I might repeat the test tomorrow.

I would like to know if there's a fixed timeout before sheep is going
the unplug the device or what else triggers it.

Sheepdog daemon version 0.7.0_144_g4f3d3e2



More information about the sheepdog-users mailing list