[Sheepdog] The distribution of objects is not even enough ?

Thu Apr 26 10:53:19 CEST 2012

Hi list,

 I made some tests against the consistent hashing algorithm of sheepdog. Here are some samples:
============================
number of objects: 600000, replication is 3, total is 1800000
the cluster contains 100 nodes
expecting 18000 objects per node
acceptable variation is 600 (objects per node)
With 2 (2.00 percent) nodes add, 33915 objects (1.88 percent) need to be relocated.
34.00 per is underload, with least as 14757
32.00 per is overload, with most as 23376

number of objects: 600000, replication is 3, total is 1800000
the cluster contains 600 nodes
expecting 3000 objects per node
acceptable variation is 100 (objects per node)
With 2 (0.00 percent) nodes add, 5881 objects (0.33 percent) need to be relocated.
33.00 per is underload, with least as 2342
31.00 per is overload, with most as 3790

number of objects: 200000, replication is 3, total is 600000
the cluster contains 200 nodes
expecting 3000 objects per node
acceptable variation is 100 (objects per node)
With 2 (1.00 percent) nodes add, 6363 objects (1.06 percent) need to be relocated.
33.00 per is underload, with least as 2432
32.00 per is overload, with most as 3623
============================

  The object ID is generated via standard random() call.
  As you can see, currently algorithm is good enough to handle adding nodes. 
  However, the distribution of objects on nodes is not even enough. The worst case is about 25% more/less objects on a single node. This also means we are going to waste about 25% disk space of the total.

  Could anyone comment on the testing result ? Am I testing it wrong ? Or there's really something to improve here.