[sheepdog] [PATCH v5 0/6] using disks to generate vnodes instead of nodes
Liu Yuan
namei.unix at gmail.com
Wed May 21 07:43:13 CEST 2014
On Wed, May 21, 2014 at 11:41:53AM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai at taobao.com>
>
> When a disk is fail in a sheepdog cluster, it will only moving data in one node
> to recovery data at present. This progress is very slow if the corrupted disk is
> very large (for example, 4TB).
>
> For example, the cluster have three nodes(node A, B, C), every node have two
> disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> When a disk on node A is corrupted, node A will try to get 8 copies to
> re-generate one corrupted data. For generating 4TB data, it will fetch 4 * 8 =
> 32TB data from remote nodes which is very inefficient.
>
> The solution to accelerate the speed of recovering is using disk to generate
> vnodes so the failing of one disk will cause whole cluster to reweight and
> moving data.
>
> Take the example above, all the vnodes in hashing-ring is generated by disk.
> Therefore when a disk is gone, all the vnodes after it should do the recovery
> work, that is, almost all the disks in the cluster will undertake the 4TB data.
> It means, the cluster will use 5 disks to store re-generating data, so one disk
> only need to receive 4 / 5 = 0.8TB data.
>
> Signed-off-by: Robin Dong <sanbai at taobao.com>
> Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
> v1-->v2:
> 1. using cinfo.flags instead of macro to check disk mode
>
> v2-->v3:
> 1. change function name from 'nodes_changed' to 'membership_changed'
> 2. add number of disks for each node in 'dog cluster info' information
>
> v3-->v4:
> 1. pass 'rinfo' to rollback_vnode_info() instead of using main_thread_get()
> 2. using '-v' option for 'dog cluster info' to show backend type and mode
>
> v4-->v5:
> 1. add comment to describle the disadvantage of disk mode
> 2. use grab_vnode_info() instead of refcount_inc()
> 3. reduce space for cache of vnode_info so we don't need to malloc and free
> SD_MAX_NODES nodes now
> 4. not check 'verbose' twice
>
Applied after remvoing dead reset_vinfo_array(). Thanks.
Yuan
More information about the sheepdog
mailing list