[sheepdog] [PATCH v3 0/9] object reclaim based on generational reference counting

Hitoshi Mitake mitake.hitoshi at gmail.com
Sun Feb 23 11:08:27 CET 2014


On Sun, Feb 23, 2014 at 2:28 PM, Hitoshi Mitake
<mitake.hitoshi at gmail.com> wrote:
> The object reclaim doesn't support hypervolume yet. But hypervolume cannot be
> used as a virtual disk (both of qemu and tgt don't support it) currently. And
> the removal of old vdi deletion is acceptable for hypervolume because it doesn't
> support snapshot, etc. So I think this patchset can be applied to the master
> branch.
>
> The author of 9th patch (11th in the v2 patchset) is kept as
> Kazutaka-san. Because the patch doesn't contain difference from v2.
>
> The same code is pushed to:
> https://github.com/sheepdog/sheepdog/tree/snapshot-object-reclaim
>
> v3:
>  - rebase onto master

Oops, sorry, I found a bug related to vdi cloning in this patchset.
I'll send v4 later.

Thanks,
Hitoshi

>
> v2:
>  - rebase onto master
>  - add test to check object reclaim feature
>  - rename *_obj() to *_object()
>  - add documentation about the algorithm
>
> Currently, sheepdog doesn't reclaim objects until all the snapshots
> are deleted.  For example:
>
>  $ dog vdi create image
>  $ dog vdi write image < some_data
>  $ dog vdi snapshot image -s snap1
>  $ dog vdi write image < some_data
>  $ dog vdi delete image            <- this doesn't reclaim the objects
>                                          of the image
>  $ dog vdi delete image -s snap1   <- this reclaims all the data objects
>                                          of both image and image:snap1
>
> To resolve this problem we need to introduce something like reference
> counting mechanism.  However, a naive approach doesn't work well on
> Sheepdog.  It is because, when we create a snapshot or clone vdi, we
> have to increment all the reference count of its data objects and send
> many requests to all over the nodes.
>
> This series uses generation reference counting [1], which is more
> appropriate algorithm for distributed system like Sheepdog.  With this
> series, we can reclaim objects just when they are no longer necessary.
>
> [1] B. Goldberg, Generation reference counting: A reduced
> communication distributed storage reclamation scheme, PLDI '89
>
> ==
> Here is a brief explanation ob the object reclaim algorithm.  If you
> want to know the details or correctness proof of the generational
> reference counting, please refer the original paper:
>
> Each data object contains the following structure in its inode object:
>
>     struct generation_reference {
>         int32_t generation;
>         int32_t count;
>     } data_ref;
>
> The generation field identifies the generation of the reference and
> the count field records the number of references copied from this
> particular reference.
>
> We also introduce a new type of object, a ledger object.  The ledger
> object exists for each data object and contains an array of int32_t
> values. The i-th element of the array contains information about the
> number of outstanding data object references in the i-th generation.
> Generational reference counting is performed as follows:
>
> 1. When a vdi A is created, the initial data object references are
>    initialized as follows:
>
>        for (i = 0; i < MAX_DATA_OBJS; i++) {
>            A.data_ref[i].generation = 0;
>            A.data_ref[i].count = 0;
>        }
>
>    and a ledger object for each data object is initialized as follows:
>
>        ledger_object[0] = 1;
>        for (i = 1; i < MAX_DATA_OBJS; i++) {
>            ledger_object[i] = 0;
>        }
>
>    In practice, the ledger object is created and initialized when the
>    object is accessed for the first time, so we don't have to create
>    all the ledger objects when the vdi is created.
>
> 2. When a VDI B is created as a clone (or snapshot) of A, the new
>    reference B is initialized as follows:
>
>        for (i = 0; i < MAX_DATA_OBJS; i++) {
>            B.data_ref[i].generation = A.data_ref[i].generation + 1;
>            B.data_ref[i].count = 0;
>        }
>
>    In addition, A.data_ref.count's are incremented:
>
>        for (i = 0; i < MAX_DATA_OBJS; i++) {
>            A.data_ref[i].count++;
>        }
>
> 3. When a object o is removed, a decrement message is sent to its
>    ledger object containing the values of o.generation and o.count.
>
> 4. When a decrement message to o is received, in which the generation
>    field has the value i and the count field has the value c, the
>    following actions are performed:
>
>    o.ledger[i]--;
>    o.ledger[i + 1] += c;
>
>    When every element of the ledger becomes zero (for all i >= 0,
>    o.ledger[i] = 0), o can be reclaimed.
>
>    Note that some elements of the ledger may hold negative values.
>
> Hitoshi Mitake (8):
>   sheep: introduce generational reference counting for object reclaim
>   sheep: introduce ledger objects
>   sheep: decrement generational reference count on vdi deletion
>   sheep: decrement generational reference count on copy-on-write
>   sheep: remove max children limit
>   sheep: introduce sparse objects
>   tests: show ledger objects in list
>   tests/functional: add test for object reclaim
>
> MORITA Kazutaka (1):
>   doc: add documentation of object reclaim algorithm
>
>  doc/object-reclaim.txt     |  73 +++++++++
>  include/internal_proto.h   |   2 +
>  include/sheepdog_proto.h   |  41 ++++-
>  include/util.h             |   3 +
>  lib/util.c                 |  62 ++++++++
>  sheep/gateway.c            | 112 +++++++++++++-
>  sheep/ops.c                |  90 +++++++++++
>  sheep/plain_store.c        |  70 ++++++++-
>  sheep/sheep_priv.h         |  16 ++
>  sheep/store.c              |  25 +++
>  sheep/vdi.c                | 369 +++++++++------------------------------------
>  tests/functional/016.out   |  14 +-
>  tests/functional/042       |   4 +-
>  tests/functional/042.out   |  44 +++---
>  tests/functional/044.out   |  14 +-
>  tests/functional/057.out   | 140 ++++++++---------
>  tests/functional/058.out   |  14 +-
>  tests/functional/063.out   |  28 ++--
>  tests/functional/064.out   |  38 ++---
>  tests/functional/087       |  30 ++++
>  tests/functional/087.out   |  42 ++++++
>  tests/functional/common.rc |  10 +-
>  tests/functional/group     |   1 +
>  23 files changed, 787 insertions(+), 455 deletions(-)
>  create mode 100644 doc/object-reclaim.txt
>  create mode 100755 tests/functional/087
>  create mode 100644 tests/functional/087.out
>
> --
> 1.8.3.2
>



More information about the sheepdog mailing list