[sheepdog] [PATCH v5 0/6] using disks to generate vnodes instead of nodes

Robin Dong robin.k.dong at gmail.com
Wed May 21 05:41:53 CEST 2014


From: Robin Dong <sanbai at taobao.com>

When a disk is fail in a sheepdog cluster, it will only moving data in one node
to recovery data at present. This progress is very slow if the corrupted disk is
very large (for example, 4TB).

For example, the cluster have three nodes(node A, B, C), every node have two
disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
When a disk on node A is corrupted, node A will try to get 8 copies to
re-generate one corrupted data. For generating 4TB data, it will fetch 4 * 8 =
32TB data from remote nodes which is very inefficient.

The solution to accelerate the speed of recovering is using disk to generate
vnodes so the failing of one disk will cause whole cluster to reweight and
moving data.

Take the example above, all the vnodes in hashing-ring is generated by disk.
Therefore when a disk is gone, all the vnodes after it should do the recovery
work, that is, almost all the disks in the cluster will undertake the 4TB data.
It means, the cluster will use 5 disks to store re-generating data, so one disk
only need to receive 4 / 5 = 0.8TB data.

Signed-off-by: Robin Dong <sanbai at taobao.com>
Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
---
v1-->v2:
  1. using cinfo.flags instead of macro to check disk mode

v2-->v3:
  1. change function name from 'nodes_changed' to 'membership_changed'
  2. add number of disks for each node in 'dog cluster info' information

v3-->v4:
  1. pass 'rinfo' to rollback_vnode_info() instead of using main_thread_get()
  2. using '-v' option for 'dog cluster info' to show backend type and mode

v4-->v5:
  1. add comment to describle the disadvantage of disk mode
  2. use grab_vnode_info() instead of refcount_inc()
  3. reduce space for cache of vnode_info so we don't need to malloc and free
     SD_MAX_NODES nodes now
  4. not check 'verbose' twice

Robin Dong (6):
  sheep: add new option for configure
  sheep: add disk information into sd_node
  sheep: change method of generating vnodes
  sheep/md: change the method of generating vnodes in md
  sheep: cache vnode_info when doing recovery
  dog: add information of disks in cluster info

 configure.ac             | 10 +++++
 dog/cluster.c            | 55 +++++++++++++++++++--------
 dog/dog.c                | 20 +++++++++-
 dog/vdi.c                |  5 ++-
 include/internal_proto.h | 24 ++++++++++--
 include/sheep.h          | 40 ++++++++++++++++++++
 sheep/config.c           |  6 +++
 sheep/group.c            | 54 +++++++++++++++++++++++++--
 sheep/md.c               | 96 ++++++++++++++++++++++++++++++++++--------------
 sheep/ops.c              |  1 +
 sheep/recovery.c         | 42 ++++++++++++++++++---
 sheep/sheep.c            |  4 ++
 sheep/sheep_priv.h       | 28 ++++++++++++++
 13 files changed, 326 insertions(+), 59 deletions(-)

-- 
1.7.12.4




More information about the sheepdog mailing list