Hi list, I made some tests against the consistent hashing algorithm of sheepdog. Here are some samples: ============================ number of objects: 600000, replication is 3, total is 1800000 the cluster contains 100 nodes expecting 18000 objects per node acceptable variation is 600 (objects per node) With 2 (2.00 percent) nodes add, 33915 objects (1.88 percent) need to be relocated. 34.00 per is underload, with least as 14757 32.00 per is overload, with most as 23376 number of objects: 600000, replication is 3, total is 1800000 the cluster contains 600 nodes expecting 3000 objects per node acceptable variation is 100 (objects per node) With 2 (0.00 percent) nodes add, 5881 objects (0.33 percent) need to be relocated. 33.00 per is underload, with least as 2342 31.00 per is overload, with most as 3790 number of objects: 200000, replication is 3, total is 600000 the cluster contains 200 nodes expecting 3000 objects per node acceptable variation is 100 (objects per node) With 2 (1.00 percent) nodes add, 6363 objects (1.06 percent) need to be relocated. 33.00 per is underload, with least as 2432 32.00 per is overload, with most as 3623 ============================ The object ID is generated via standard random() call. As you can see, currently algorithm is good enough to handle adding nodes. However, the distribution of objects on nodes is not even enough. The worst case is about 25% more/less objects on a single node. This also means we are going to waste about 25% disk space of the total. Could anyone comment on the testing result ? Am I testing it wrong ? Or there's really something to improve here. |