[sheepdog] [PATCH v2 0/5] using disks to generate vnodes instead of nodes

Mon May 19 03:22:10 CEST 2014

Hi Hitoshi,

Thanks for your suggestion.

I will update "dog cluster info" soon.

2014-05-16 22:37 GMT+08:00 Hitoshi Mitake <mitake.hitoshi at gmail.com>:

> At Fri, 16 May 2014 16:17:20 +0800,
> Robin Dong wrote:
> >
> > [1  <text/plain; UTF-8 (7bit)>]
> > Hi, Kazutaka and Hitosh
> >
> > Could you give some suggestions about this patchset ?
>
> I really like your idea. It must be useful for machines with bunch of
> disks. Of course there are some points for improvements
> (e.g. introduced #ifdefs should be removed in the future), but it
> seems to be a good first step.
>
> Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
>
> BTW, I have one request: could you update "dog cluster info" for
> printing information of node changing?
>
> Thanks,
> Hitoshi
>
> >
> >
> > 2014-05-09 16:18 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
> >
> > > On Wed, May 07, 2014 at 06:25:37PM +0800, Robin Dong wrote:
> > > > From: Robin Dong <sanbai at taobao.com>
> > > >
> > > > When a disk is fail in a sheepdog cluster, it will only moving data
> in
> > > one node
> > > > to recovery data at present. This progress is very slow if the
> corrupted
> > > disk is
> > > > very large (for example, 4TB).
> > > >
> > > > For example, the cluster have three nodes(node A, B, C), every node
> have
> > > two
> > > > disks, every disk's size is 4TB. The cluster is using 8:4
> erasure-code.
> > > > When a disk on node A is corrupted, node A will try to get 8 copies
> to
> > > > re-generate one corrupted data. For generating 4TB data, it will
> fetch 4
> > > * 8 =
> > > > 32TB data from remote nodes which is very inefficient.
> > > >
> > > > The solution to accelerate the speed of recovering is using disk to
> > > generate
> > > > vnodes so the failing of one disk will cause whole cluster to
> reweight
> > > and
> > > > moving data.
> > > >
> > > > Take the example above, all the vnodes in hashing-ring is generated
> by
> > > disk.
> > > > Therefore when a disk is gone, all the vnodes after it should do the
> > > recovery
> > > > work, that is, almost all the disks in the cluster will undertake the
> > > 4TB data.
> > > > It means, the cluster will use 5 disks to store re-generating data,
> so
> > > one disk
> > > > only need to receive 4 / 5 = 0.8TB data.
> > > >
> > >
> > > Kazutaka and Hitosh, any comments? Provide an means to allow users to
> > > configure
> > > disk instead of the whole node as the basic ring unit might be
> intereted
> > > to some
> > > users who care about more recovery performance.
> > >
> > > Thanks
> > > Yuan
> > >
> >
> >
> >
> > --
> > --
> > Best Regard
> > Robin Dong
> > [2  <text/html; UTF-8 (quoted-printable)>]
> >
>

-- 
--
Best Regard
Robin Dong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140519/afeb7518/attachment-0004.html>