[sheepdog] [PATCH RFC] add a new flag of cluster SD_CLUSTER_FLAG_INODE_HASH_CHECK for checking inode object corruption
Liu Yuan
namei.unix at gmail.com
Wed Jan 29 19:21:54 CET 2014
On Thu, Jan 30, 2014 at 12:20:35AM +0900, Hitoshi Mitake wrote:
> From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
>
> Current sheepdog cannot handle corruption of inode objects. For
> example, members like name or nr_copies of sd_inode are broken by
> silent data corruption of disks, even initialization of sheep
> processes fail. Because sheep and dog themselves interpret the content
> of inode objects.
any resource to confirm so called 'silent data corruption'? Modern disk has
built-in correction code (RS) for each sector. So either EIO or full data will
return from disks as far as I know. I've never seen a real 'silent data corruption'
yet in person. I know many people suspect it would happen, but I think we need
real proof of it because most of time, it is false positive.
> For detecting such a corruption of inode objects, this patch adds a
> new flag of cluster SD_CLUSTER_FLAG_INODE_HASH_CHECK. If the flag is
> passed as an option of cluster format (dog cluster format -i), sheep
> processes belong to the cluster do below actions:
>
> - when the sheep updates inode objects, it stores sha1 value of the
> object to xattr (default_write())
> - when the sheep reads an inode object, it caliculates sha1 value of
> the inode object. Then it compares the caliculated value with the
> stored one. If these values differ, the reading causes error
> (default_read()).
>
> This checking mechanism prevents interpretation of corrupted inode
> objects by sheep.
I don't think we should implement this check in the sheep. It's better to do
it in dog as a 'check' plugin because
- no need to introduce imcompatible physical layout (extra xattr)
- dog is in a better position to quorum recovery. even if all the hashes are
valid, you can't make sure the consistency between different copies.
- as you mentioned, performance is awful. only inode checking is basically
useless, users care about their data, not your internal metadata. If we add
it to data, performance is unacceptably bad.
- current backend store (plain sotre + MD) is far more complex than it appears
I have recently spent days to fix a tricky data lost bug. So basically I don't
want to add more tricky code in it without compelling reason(s). Right now,
we should keep backend store as 100% reliable before we can do anything useful
Thanks
Yuan
More information about the sheepdog
mailing list