[sheepdog] [PATCH v5 0/6] using disks to generate vnodes instead of nodes

Liu Yuan namei.unix at gmail.com
Wed May 21 07:43:13 CEST 2014


On Wed, May 21, 2014 at 11:41:53AM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai at taobao.com>
> 
> When a disk is fail in a sheepdog cluster, it will only moving data in one node
> to recovery data at present. This progress is very slow if the corrupted disk is
> very large (for example, 4TB).
> 
> For example, the cluster have three nodes(node A, B, C), every node have two
> disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> When a disk on node A is corrupted, node A will try to get 8 copies to
> re-generate one corrupted data. For generating 4TB data, it will fetch 4 * 8 =
> 32TB data from remote nodes which is very inefficient.
> 
> The solution to accelerate the speed of recovering is using disk to generate
> vnodes so the failing of one disk will cause whole cluster to reweight and
> moving data.
> 
> Take the example above, all the vnodes in hashing-ring is generated by disk.
> Therefore when a disk is gone, all the vnodes after it should do the recovery
> work, that is, almost all the disks in the cluster will undertake the 4TB data.
> It means, the cluster will use 5 disks to store re-generating data, so one disk
> only need to receive 4 / 5 = 0.8TB data.
> 
> Signed-off-by: Robin Dong <sanbai at taobao.com>
> Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> ---
> v1-->v2:
>   1. using cinfo.flags instead of macro to check disk mode
> 
> v2-->v3:
>   1. change function name from 'nodes_changed' to 'membership_changed'
>   2. add number of disks for each node in 'dog cluster info' information
> 
> v3-->v4:
>   1. pass 'rinfo' to rollback_vnode_info() instead of using main_thread_get()
>   2. using '-v' option for 'dog cluster info' to show backend type and mode
> 
> v4-->v5:
>   1. add comment to describle the disadvantage of disk mode
>   2. use grab_vnode_info() instead of refcount_inc()
>   3. reduce space for cache of vnode_info so we don't need to malloc and free
>      SD_MAX_NODES nodes now
>   4. not check 'verbose' twice
> 

Applied after remvoing dead reset_vinfo_array(). Thanks.

Yuan



More information about the sheepdog mailing list