[sheepdog] [PATCH v2 00/11] object reclaim based on generational reference counting
MORITA Kazutaka
morita.kazutaka at gmail.com
Tue Jun 18 19:14:20 CEST 2013
From: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
v2:
- rebase onto master
- add test to check object reclaim feature
- rename *_obj() to *_object()
- add documentation about the algorithm
Currently, sheepdog doesn't reclaim objects until all the snapshots
are deleted. For example:
$ collie vdi create image
$ collie vdi write image < some_data
$ collie vdi snapshot image -s snap1
$ collie vdi write image < some_data
$ collie vdi delete image <- this doesn't reclaim the objects
of the image
$ collie vdi delete image -s snap1 <- this reclaims all the data objects
of both image and image:snap1
To resolve this problem we need to introduce something like reference
counting mechanism. However, a naive approach doesn't work well on
Sheepdog. It is because, when we create a snapshot or clone vdi, we
have to increment all the reference count of its data objects and send
many requests to all over the nodes.
This series uses generation reference counting [1], which is more
appropriate algorithm for distributed system like Sheepdog. With this
series, we can reclaim objects just when they are no longer necessary.
[1] B. Goldberg, Generation reference counting: A reduced
communication distributed storage reclamation scheme, PLDI '89
==
Here is a brief explanation ob the object reclaim algorithm. If you
want to know the details or correctness proof of the generational
reference counting, please refer the original paper:
Each data object contains the following structure in its inode object:
struct generation_reference {
int32_t generation;
int32_t count;
} data_ref;
The generation field identifies the generation of the reference and
the count field records the number of references copied from this
particular reference.
We also introduce a new type of object, a ledger object. The ledger
object exists for each data object and contains an array of int32_t
values. The i-th element of the array contains information about the
number of outstanding data object references in the i-th generation.
Generational reference counting is performed as follows:
1. When a vdi A is created, the initial data object references are
initialized as follows:
for (i = 0; i < MAX_DATA_OBJS; i++) {
A.data_ref[i].generation = 0;
A.data_ref[i].count = 0;
}
and a ledger object for each data object is initialized as follows:
ledger_object[0] = 1;
for (i = 1; i < MAX_DATA_OBJS; i++) {
ledger_object[i] = 0;
}
In practice, the ledger object is created and initialized when the
object is accessed for the first time, so we don't have to create
all the ledger objects when the vdi is created.
2. When a VDI B is created as a clone (or snapshot) of A, the new
reference B is initialized as follows:
for (i = 0; i < MAX_DATA_OBJS; i++) {
B.data_ref[i].generation = A.data_ref[i].generation + 1;
B.data_ref[i].count = 0;
}
In addition, A.data_ref.count's are incremented:
for (i = 0; i < MAX_DATA_OBJS; i++) {
A.data_ref[i].count++;
}
3. When a object o is removed, a decrement message is sent to its
ledger object containing the values of o.generation and o.count.
4. When a decrement message to o is received, in which the generation
field has the value i and the count field has the value c, the
following actions are performed:
o.ledger[i]--;
o.ledger[i + 1] += c;
When every element of the ledger becomes zero (for all i >= 0,
o.ledger[i] = 0), o can be reclaimed.
Note that some elements of the ledger may hold negative values.
MORITA Kazutaka (11):
sheep: introduce xfallocate and xftruncate helpers
use 4KB for trim_zero_sectors instead of 512B
sheep: introduce generational reference counting for object reclaim
sheep: introduce ledger objects
sheep: decrement generational reference count on vdi deletion
sheep: decrement generational reference count on copy-on-write
sheep: remove max children limit
sheep: introduce sparse objects
tests: show ledger objects in list
tests/functional: add test for object reclaim
doc: add documentation of object reclaim algorithm
collie/common.c | 12 +--
collie/farm/sha1_file.c | 2 +-
collie/vdi.c | 27 +++---
doc/object-reclaim.txt | 73 ++++++++++++++
include/internal_proto.h | 2 +
include/sheepdog_proto.h | 43 ++++++++-
include/util.h | 9 +-
lib/util.c | 83 +++++++++++++---
sheep/cluster/local.c | 6 +-
sheep/gateway.c | 112 ++++++++++++++++++++-
sheep/journal.c | 2 +-
sheep/migrate.c | 2 +-
sheep/ops.c | 102 +++++++++++++++++--
sheep/plain_store.c | 83 ++++++++++++++--
sheep/sheep_priv.h | 14 +++
sheep/store.c | 26 ++++-
sheep/vdi.c | 231 +++-----------------------------------------
sheepfs/volume.c | 3 +-
tests/functional/016.out | 8 +-
tests/functional/042.out | 10 +-
tests/functional/044.out | 8 +-
tests/functional/057.out | 140 +++++++++++++--------------
tests/functional/058.out | 16 +--
tests/functional/063.out | 28 +++---
tests/functional/064.out | 38 ++++----
tests/functional/065 | 40 ++++++++
tests/functional/065.out | 42 ++++++++
tests/functional/common.rc | 10 +-
tests/functional/group | 1 +
29 files changed, 762 insertions(+), 411 deletions(-)
create mode 100644 doc/object-reclaim.txt
create mode 100755 tests/functional/065
create mode 100644 tests/functional/065.out
--
1.7.9.5
More information about the sheepdog
mailing list