[sheepdog] [PATCH v3 0/9] object reclaim based on generational reference counting
Hitoshi Mitake
mitake.hitoshi at gmail.com
Sun Feb 23 11:08:27 CET 2014
On Sun, Feb 23, 2014 at 2:28 PM, Hitoshi Mitake
<mitake.hitoshi at gmail.com> wrote:
> The object reclaim doesn't support hypervolume yet. But hypervolume cannot be
> used as a virtual disk (both of qemu and tgt don't support it) currently. And
> the removal of old vdi deletion is acceptable for hypervolume because it doesn't
> support snapshot, etc. So I think this patchset can be applied to the master
> branch.
>
> The author of 9th patch (11th in the v2 patchset) is kept as
> Kazutaka-san. Because the patch doesn't contain difference from v2.
>
> The same code is pushed to:
> https://github.com/sheepdog/sheepdog/tree/snapshot-object-reclaim
>
> v3:
> - rebase onto master
Oops, sorry, I found a bug related to vdi cloning in this patchset.
I'll send v4 later.
Thanks,
Hitoshi
>
> v2:
> - rebase onto master
> - add test to check object reclaim feature
> - rename *_obj() to *_object()
> - add documentation about the algorithm
>
> Currently, sheepdog doesn't reclaim objects until all the snapshots
> are deleted. For example:
>
> $ dog vdi create image
> $ dog vdi write image < some_data
> $ dog vdi snapshot image -s snap1
> $ dog vdi write image < some_data
> $ dog vdi delete image <- this doesn't reclaim the objects
> of the image
> $ dog vdi delete image -s snap1 <- this reclaims all the data objects
> of both image and image:snap1
>
> To resolve this problem we need to introduce something like reference
> counting mechanism. However, a naive approach doesn't work well on
> Sheepdog. It is because, when we create a snapshot or clone vdi, we
> have to increment all the reference count of its data objects and send
> many requests to all over the nodes.
>
> This series uses generation reference counting [1], which is more
> appropriate algorithm for distributed system like Sheepdog. With this
> series, we can reclaim objects just when they are no longer necessary.
>
> [1] B. Goldberg, Generation reference counting: A reduced
> communication distributed storage reclamation scheme, PLDI '89
>
> ==
> Here is a brief explanation ob the object reclaim algorithm. If you
> want to know the details or correctness proof of the generational
> reference counting, please refer the original paper:
>
> Each data object contains the following structure in its inode object:
>
> struct generation_reference {
> int32_t generation;
> int32_t count;
> } data_ref;
>
> The generation field identifies the generation of the reference and
> the count field records the number of references copied from this
> particular reference.
>
> We also introduce a new type of object, a ledger object. The ledger
> object exists for each data object and contains an array of int32_t
> values. The i-th element of the array contains information about the
> number of outstanding data object references in the i-th generation.
> Generational reference counting is performed as follows:
>
> 1. When a vdi A is created, the initial data object references are
> initialized as follows:
>
> for (i = 0; i < MAX_DATA_OBJS; i++) {
> A.data_ref[i].generation = 0;
> A.data_ref[i].count = 0;
> }
>
> and a ledger object for each data object is initialized as follows:
>
> ledger_object[0] = 1;
> for (i = 1; i < MAX_DATA_OBJS; i++) {
> ledger_object[i] = 0;
> }
>
> In practice, the ledger object is created and initialized when the
> object is accessed for the first time, so we don't have to create
> all the ledger objects when the vdi is created.
>
> 2. When a VDI B is created as a clone (or snapshot) of A, the new
> reference B is initialized as follows:
>
> for (i = 0; i < MAX_DATA_OBJS; i++) {
> B.data_ref[i].generation = A.data_ref[i].generation + 1;
> B.data_ref[i].count = 0;
> }
>
> In addition, A.data_ref.count's are incremented:
>
> for (i = 0; i < MAX_DATA_OBJS; i++) {
> A.data_ref[i].count++;
> }
>
> 3. When a object o is removed, a decrement message is sent to its
> ledger object containing the values of o.generation and o.count.
>
> 4. When a decrement message to o is received, in which the generation
> field has the value i and the count field has the value c, the
> following actions are performed:
>
> o.ledger[i]--;
> o.ledger[i + 1] += c;
>
> When every element of the ledger becomes zero (for all i >= 0,
> o.ledger[i] = 0), o can be reclaimed.
>
> Note that some elements of the ledger may hold negative values.
>
> Hitoshi Mitake (8):
> sheep: introduce generational reference counting for object reclaim
> sheep: introduce ledger objects
> sheep: decrement generational reference count on vdi deletion
> sheep: decrement generational reference count on copy-on-write
> sheep: remove max children limit
> sheep: introduce sparse objects
> tests: show ledger objects in list
> tests/functional: add test for object reclaim
>
> MORITA Kazutaka (1):
> doc: add documentation of object reclaim algorithm
>
> doc/object-reclaim.txt | 73 +++++++++
> include/internal_proto.h | 2 +
> include/sheepdog_proto.h | 41 ++++-
> include/util.h | 3 +
> lib/util.c | 62 ++++++++
> sheep/gateway.c | 112 +++++++++++++-
> sheep/ops.c | 90 +++++++++++
> sheep/plain_store.c | 70 ++++++++-
> sheep/sheep_priv.h | 16 ++
> sheep/store.c | 25 +++
> sheep/vdi.c | 369 +++++++++------------------------------------
> tests/functional/016.out | 14 +-
> tests/functional/042 | 4 +-
> tests/functional/042.out | 44 +++---
> tests/functional/044.out | 14 +-
> tests/functional/057.out | 140 ++++++++---------
> tests/functional/058.out | 14 +-
> tests/functional/063.out | 28 ++--
> tests/functional/064.out | 38 ++---
> tests/functional/087 | 30 ++++
> tests/functional/087.out | 42 ++++++
> tests/functional/common.rc | 10 +-
> tests/functional/group | 1 +
> 23 files changed, 787 insertions(+), 455 deletions(-)
> create mode 100644 doc/object-reclaim.txt
> create mode 100755 tests/functional/087
> create mode 100644 tests/functional/087.out
>
> --
> 1.8.3.2
>
More information about the sheepdog
mailing list