[sheepdog] [PATCH v2 0/5] using disks to generate vnodes instead of nodes

Liu Yuan namei.unix at gmail.com
Fri May 9 10:18:25 CEST 2014


On Wed, May 07, 2014 at 06:25:37PM +0800, Robin Dong wrote:
> From: Robin Dong <sanbai at taobao.com>
> 
> When a disk is fail in a sheepdog cluster, it will only moving data in one node
> to recovery data at present. This progress is very slow if the corrupted disk is
> very large (for example, 4TB).
> 
> For example, the cluster have three nodes(node A, B, C), every node have two
> disks, every disk's size is 4TB. The cluster is using 8:4 erasure-code.
> When a disk on node A is corrupted, node A will try to get 8 copies to
> re-generate one corrupted data. For generating 4TB data, it will fetch 4 * 8 =
> 32TB data from remote nodes which is very inefficient.
> 
> The solution to accelerate the speed of recovering is using disk to generate
> vnodes so the failing of one disk will cause whole cluster to reweight and
> moving data.
> 
> Take the example above, all the vnodes in hashing-ring is generated by disk.
> Therefore when a disk is gone, all the vnodes after it should do the recovery
> work, that is, almost all the disks in the cluster will undertake the 4TB data.
> It means, the cluster will use 5 disks to store re-generating data, so one disk
> only need to receive 4 / 5 = 0.8TB data.
> 

Kazutaka and Hitosh, any comments? Provide an means to allow users to configure
disk instead of the whole node as the basic ring unit might be intereted to some
users who care about more recovery performance.

Thanks
Yuan



More information about the sheepdog mailing list