I tried Strategy 2 like this: for (v=0, o = 0; v < SD_MAX_VNODES; v++, o++) { if (o >= nr) o = 0; vnodes[v].id = UINT64_MAX / SD_MAX_VNODES * v; vnodes[v].node = &(nodes[o]); vnodes[v].node_idx = o; } for (v=0; v < SD_MAX_VNODES; v++) { c = ((unsigned int)(v+random()))%SD_MAX_VNODES; vnode = vnodes[v]; vnodes[v] = vnodes[c]; vnodes[c] = vnode; } It surely helps to ensure an equally distribution. But on the other hand, it’s important to track the mapping from vnode to node then, to minimize the data movement when join/leave happens. It’s basically how swift (from openstack) do. And I found it overwhelm for sheepdog. Strategy 3 is basically how glusterfs do. But, recording hash space also means you’ll face a lot of data movement again. glusterfs use “link” extensively to avoid data moving. Should we do that too ? BTW: I found the easiest way to improve distribution while maintain the simplicity of current algo. Is to increase VNODES_PER_NODE. The downside is, of course, more memory to use for the vnode_list. :( FYI. ________________________________ Xinwei Hu 华为技术有限公司 Huawei Technologies Co., Ltd. [Company_logo] Phone: Fax: Mobile: Email: huxinwei at huawei.com 地址:深圳市龙岗区坂田华为基地 邮编:518129 Huawei Technologies Co., Ltd. Bantian, Longgang District,Shenzhen 518129, P.R.China http://www.huawei.com ________________________________ 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! From: sheepdog-bounces at lists.wpkg.org [mailto:sheepdog-bounces at lists.wpkg.org] On Behalf Of HaiTing Yao Sent: Friday, May 04, 2012 10:24 AM To: sheepdog at lists.wpkg.org Subject: [Sheepdog] Add more hash strategy Hi, I read codes about hash now and tried to do something. I need some comments to do it properly. Strategy 1 (We used now) Random tokens per node Partition by random token. The partition token is produced by hash function with node IP/port. Strategy 2 Random tokens per node Equal sized partitions. The hash space is divided into Q equally sized partitions. Q is the number of nodes or vnodes. Strategy 3 Random tokens per node Equal sized partitions. When there is node join/leaving, the equal size may be broken to minimize the data movement. For strategy 3, need write hash space to disk. I think I can use the object to record the infomation. Do we need add the strategy 2/3, or there is better strategies? Thanks Haiti -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120504/3c99f9b5/attachment.html> |