| I tried Strategy 2 like this:
    for (v=0, o = 0; v < SD_MAX_VNODES; v++, o++) {
        if (o >= nr) o = 0;
        vnodes[v].id = UINT64_MAX / SD_MAX_VNODES * v;
        vnodes[v].node = &(nodes[o]);
        vnodes[v].node_idx = o;
    }
    for (v=0; v < SD_MAX_VNODES; v++) {
        c = ((unsigned int)(v+random()))%SD_MAX_VNODES;
        vnode = vnodes[v];
        vnodes[v] = vnodes[c];
        vnodes[c] = vnode;
    }
It surely helps to ensure an equally distribution.
But on the other hand, it’s important to track the mapping from vnode to node then, to minimize the data movement when join/leave happens.
It’s basically how swift (from openstack) do. And I found it overwhelm for sheepdog.
Strategy 3 is basically how glusterfs do.
But, recording hash space also means you’ll face a lot of data movement again.
glusterfs use “link” extensively to avoid data moving. Should we do that too ?
BTW: I found the easiest way to improve distribution while maintain the simplicity of current algo. Is to increase VNODES_PER_NODE.
The downside is, of course, more memory to use for the vnode_list. :(
FYI.
________________________________
Xinwei Hu
华为技术有限公司 Huawei Technologies Co., Ltd.
[Company_logo]
Phone:
Fax:
Mobile:
Email: huxinwei at huawei.com
地址:深圳市龙岗区坂田华为基地 邮编:518129
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com
________________________________
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!
From: sheepdog-bounces at lists.wpkg.org [mailto:sheepdog-bounces at lists.wpkg.org] On Behalf Of HaiTing Yao
Sent: Friday, May 04, 2012 10:24 AM
To: sheepdog at lists.wpkg.org
Subject: [Sheepdog] Add more hash strategy
Hi,
I read codes about hash now and tried to do something. I need some comments to do it properly.
Strategy 1 (We used now)
Random tokens per node
Partition by random token. The partition token is produced by hash function with node IP/port.
Strategy 2
Random tokens per node
Equal sized partitions. The hash space is divided into Q equally sized partitions. Q is the number of nodes or vnodes.
Strategy 3
Random tokens per node
Equal sized partitions. When there is node join/leaving, the equal size may be broken to minimize the data movement.
For strategy 3, need write hash space to disk. I think I can use the object to record the infomation.
Do we need add the strategy 2/3, or there is better strategies?
Thanks
Haiti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120504/3c99f9b5/attachment.html>
 |  |