[sheepdog] [PATCH v4 0/6] using disks to generate vnodes instead of nodes

Tue May 20 11:26:12 CEST 2014

On Tue, May 20, 2014 at 04:40:28PM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai at taobao.com>
> 
> When a disk is fail in a sheepdog cluster, it will only moving data in one node
> to recovery data at present. This progress is very slow if the corrupted disk is
> very large (for example, 4TB).
> 
> For example, the cluster have three nodes(node A, B, C), every node have two
> disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> When a disk on node A is corrupted, node A will try to get 8 copies to
> re-generate one corrupted data. For generating 4TB data, it will fetch 4 * 8 =
> 32TB data from remote nodes which is very inefficient.
> 
> The solution to accelerate the speed of recovering is using disk to generate
> vnodes so the failing of one disk will cause whole cluster to reweight and
> moving data.
> 
> Take the example above, all the vnodes in hashing-ring is generated by disk.
> Therefore when a disk is gone, all the vnodes after it should do the recovery
> work, that is, almost all the disks in the cluster will undertake the 4TB data.
> It means, the cluster will use 5 disks to store re-generating data, so one disk
> only need to receive 4 / 5 = 0.8TB data.
> 
> Signed-off-by: Robin Dong <sanbai at taobao.com>

Please add Reviewed-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
in the commit log

Thanks
Yuan