[sheepdog] [PATCH v2 0/5] using disks to generate vnodes instead of nodes
Hitoshi Mitake
mitake.hitoshi at gmail.com
Fri May 16 16:37:13 CEST 2014
At Fri, 16 May 2014 16:17:20 +0800,
Robin Dong wrote:
>
> [1 <text/plain; UTF-8 (7bit)>]
> Hi, Kazutaka and Hitosh
>
> Could you give some suggestions about this patchset ?
I really like your idea. It must be useful for machines with bunch of
disks. Of course there are some points for improvements
(e.g. introduced #ifdefs should be removed in the future), but it
seems to be a good first step.
Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
BTW, I have one request: could you update "dog cluster info" for
printing information of node changing?
Thanks,
Hitoshi
>
>
> 2014-05-09 16:18 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
>
> > On Wed, May 07, 2014 at 06:25:37PM +0800, Robin Dong wrote:
> > > From: Robin Dong <sanbai at taobao.com>
> > >
> > > When a disk is fail in a sheepdog cluster, it will only moving data in
> > one node
> > > to recovery data at present. This progress is very slow if the corrupted
> > disk is
> > > very large (for example, 4TB).
> > >
> > > For example, the cluster have three nodes(node A, B, C), every node have
> > two
> > > disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> > > When a disk on node A is corrupted, node A will try to get 8 copies to
> > > re-generate one corrupted data. For generating 4TB data, it will fetch 4
> > * 8 =
> > > 32TB data from remote nodes which is very inefficient.
> > >
> > > The solution to accelerate the speed of recovering is using disk to
> > generate
> > > vnodes so the failing of one disk will cause whole cluster to reweight
> > and
> > > moving data.
> > >
> > > Take the example above, all the vnodes in hashing-ring is generated by
> > disk.
> > > Therefore when a disk is gone, all the vnodes after it should do the
> > recovery
> > > work, that is, almost all the disks in the cluster will undertake the
> > 4TB data.
> > > It means, the cluster will use 5 disks to store re-generating data, so
> > one disk
> > > only need to receive 4 / 5 = 0.8TB data.
> > >
> >
> > Kazutaka and Hitosh, any comments? Provide an means to allow users to
> > configure
> > disk instead of the whole node as the basic ring unit might be intereted
> > to some
> > users who care about more recovery performance.
> >
> > Thanks
> > Yuan
> >
>
>
>
> --
> --
> Best Regard
> Robin Dong
> [2 <text/html; UTF-8 (quoted-printable)>]
>
More information about the sheepdog
mailing list