[sheepdog] [PATCH v2 0/5] using disks to generate vnodes instead of nodes
Robin Dong
robin.k.dong at gmail.com
Fri May 16 10:17:20 CEST 2014
Hi, Kazutaka and Hitosh
Could you give some suggestions about this patchset ?
2014-05-09 16:18 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
> On Wed, May 07, 2014 at 06:25:37PM +0800, Robin Dong wrote:
> > From: Robin Dong <sanbai at taobao.com>
> >
> > When a disk is fail in a sheepdog cluster, it will only moving data in
> one node
> > to recovery data at present. This progress is very slow if the corrupted
> disk is
> > very large (for example, 4TB).
> >
> > For example, the cluster have three nodes(node A, B, C), every node have
> two
> > disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> > When a disk on node A is corrupted, node A will try to get 8 copies to
> > re-generate one corrupted data. For generating 4TB data, it will fetch 4
> * 8 =
> > 32TB data from remote nodes which is very inefficient.
> >
> > The solution to accelerate the speed of recovering is using disk to
> generate
> > vnodes so the failing of one disk will cause whole cluster to reweight
> and
> > moving data.
> >
> > Take the example above, all the vnodes in hashing-ring is generated by
> disk.
> > Therefore when a disk is gone, all the vnodes after it should do the
> recovery
> > work, that is, almost all the disks in the cluster will undertake the
> 4TB data.
> > It means, the cluster will use 5 disks to store re-generating data, so
> one disk
> > only need to receive 4 / 5 = 0.8TB data.
> >
>
> Kazutaka and Hitosh, any comments? Provide an means to allow users to
> configure
> disk instead of the whole node as the basic ring unit might be intereted
> to some
> users who care about more recovery performance.
>
> Thanks
> Yuan
>
--
--
Best Regard
Robin Dong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140516/24f80b76/attachment-0004.html>
More information about the sheepdog
mailing list