[sheepdog] [PATCH RFC] add a new flag of cluster SD_CLUSTER_FLAG_INODE_HASH_CHECK for checking inode object corruption
Hitoshi Mitake
mitake.hitoshi at gmail.com
Thu Jan 30 06:45:09 CET 2014
At Thu, 30 Jan 2014 12:48:18 +0800,
Liu Yuan wrote:
>
> On Thu, Jan 30, 2014 at 10:07:37AM +0800, Liu Yuan wrote:
> > On Thu, Jan 30, 2014 at 10:53:39AM +0900, Hitoshi Mitake wrote:
> > > At Thu, 30 Jan 2014 02:21:54 +0800,
> > > Liu Yuan wrote:
> > > >
> > > > On Thu, Jan 30, 2014 at 12:20:35AM +0900, Hitoshi Mitake wrote:
> > > > > From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> > > > >
> > > > > Current sheepdog cannot handle corruption of inode objects. For
> > > > > example, members like name or nr_copies of sd_inode are broken by
> > > > > silent data corruption of disks, even initialization of sheep
> > > > > processes fail. Because sheep and dog themselves interpret the content
> > > > > of inode objects.
> > > >
> > > > any resource to confirm so called 'silent data corruption'? Modern disk has
> > > > built-in correction code (RS) for each sector. So either EIO or full data will
> > > > return from disks as far as I know. I've never seen a real 'silent data corruption'
> > > > yet in person. I know many people suspect it would happen, but I think we need
> > > > real proof of it because most of time, it is false positive.
> > >
> > > This paper is a major source of the "silent data corruption'":
> > > https://www.usenix.org/legacy/events/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf
> > >
> > > Of course the corruption happens rarely. But it can happen so we
> > > should handle it. So we have the majority voting mechanism of "dog vdi
> > > check", no?
> > >
> > > >
> > > > > For detecting such a corruption of inode objects, this patch adds a
> > > > > new flag of cluster SD_CLUSTER_FLAG_INODE_HASH_CHECK. If the flag is
> > > > > passed as an option of cluster format (dog cluster format -i), sheep
> > > > > processes belong to the cluster do below actions:
> > > > >
> > > > > - when the sheep updates inode objects, it stores sha1 value of the
> > > > > object to xattr (default_write())
> > > > > - when the sheep reads an inode object, it caliculates sha1 value of
> > > > > the inode object. Then it compares the caliculated value with the
> > > > > stored one. If these values differ, the reading causes error
> > > > > (default_read()).
> > > > >
> > > > > This checking mechanism prevents interpretation of corrupted inode
> > > > > objects by sheep.
> > > >
> > > > I don't think we should implement this check in the sheep. It's better to do
> > > > it in dog as a 'check' plugin because
> > > >
> > > > - no need to introduce imcompatible physical layout (extra xattr)
> > >
> > > This patch doesn't produce incompatibility. The used xattr is "user.obj.sha1".
> > >
>
> how do you handle if sector that holds xattr value is corrupped sliently? can
> this be false positive and how do you handle it?
Of course such a case is treated as an error. And it is recovered by
"dog vdi check".
Thanks,
Hitoshi
More information about the sheepdog
mailing list