[sheepdog-users] Failing disk tests: disk not responding
Liu Yuan
namei.unix at gmail.com
Thu Oct 17 18:44:42 CEST 2013
On Wed, Oct 16, 2013 at 05:21:57PM +0200, Valerio Pachera wrote:
> Here I simulate the situation of a disk that physically breaks down
> and the OS can't contact it anymore.
> So we are not able to clean unplug and un-mount the device.
>
> If it was a raid software, it would sing the disk as "failed" and
> continue working on the good one.
>
> How is seepdog going to behave?
>
> # dog node md info
> Id Size Used Avail Use% Path
> 0 220 GB 24 GB 196 GB 10% /mnt/sheep/dsk01/obj
> 1 149 GB 16 GB 133 GB 10% /mnt/sheep/dsk03
>
> # df -h | grep sheep
> /dev/mapper/vg00-sheepdog 220G 24G 196G 11% /mnt/sheep/dsk01
> /dev/sdc1 149G 17G 133G 11% /mnt/sheep/dsk03
>
> # echo 1 > /sys/block/sdc/device/delete
>
> # ls /mnt/sheep/dsk03
> ls: impossibile accedere a /mnt/sheep/dsk03: Errore di input/output
>
> # less /var/log/sheep.log
> Oct 16 16:50:51 ERROR [io 4151] for_each_object_in_path(175) failed
> to open /mnt/sheep/dsk03, Input/output error
> Oct 16 16:58:31 ERROR [io 4151] for_each_object_in_path(175) failed
> to open /mnt/sheep/dsk03, Input/output error
> Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
> /mnt/sheep/dsk03/007ab62200000020, Input/output error
> Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
> /mnt/sheep/dsk03/007ab62200000020, Input/output error
> Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
> /mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
> Oct 16 16:58:43 ERROR [io 4151] md_access(457) failed to check
> /mnt/sheep/dsk03/.stale/007ab62200000020.1, Input/output error
> Oct 16 16:58:43 ERROR [main] modify_event(156) event info for fd 25 not found
>
>
> # dog node md info
> Id Size Used Avail Use% Path
> 0 220 GB 24 GB 196 GB 10% /mnt/sheep/dsk01/obj
> 1 0.0 MB 0.0 MB 0.0 MB -2147483648% /mnt/sheep/dsk03
>
>
> (on another node)
> # dog vdi check squeeze1
> 22.0 % [=====================================================>
>
>
> ] 2.2 GB / 10 GB failed to
> read 7ab62200000020 from 192.168.2.47:7000, I/O error
>
>
>
> After some time I notice
>
> Oct 16 17:14:07 ERROR [gway 4150] err_to_sderr(95)
> oid=8036657100000000, Input/output error
> Oct 16 17:14:07 ERROR [gway 4150] gateway_replication_read(268) local
> read 8036657100000000 failed, Network error between sheep
> Oct 16 17:14:07 INFO [main] md_remove_disk(316) /mnt/sheep/dsk03
> from multi-disk array
>
> # dog node md info
> Id Size Used Avail Use% Path
> 0 220 GB 29 GB 191 GB 13% /mnt/sheep/dsk01/obj
>
>
> My guests were not running, so I can't tell you if they were going to freeze.
> I might repeat the test tomorrow.
>
> I would like to know if there's a fixed timeout before sheep is going
> the unplug the device or what else triggers it.
There is only one event will trigger auto unplug, that is EIO of the broken disk
when client accesses it.
Guest will not go to freeze when disks managed by sheepdog get broken, this is
what a distributed system should provide as a bottom line.
Thanks
Yuan
More information about the sheepdog-users
mailing list