[sheepdog] [PATCH v2 1/2] sheep, dog: make recycling VID selectable

Liu Yuan namei.unix at gmail.com
Mon Mar 16 14:13:29 CET 2015


On Mon, Mar 16, 2015 at 09:39:35PM +0900, Hitoshi Mitake wrote:
> On Mon, Mar 16, 2015 at 7:42 PM, Liu Yuan <namei.unix at gmail.com> wrote:
> > On Mon, Mar 16, 2015 at 03:57:40PM +0900, Hitoshi Mitake wrote:
> >> This patch adds a new option -R to "dog cluster format". If user
> >> specifies the option during cluster format, the cluster will enable
> >> recycling VID (disabled in default).
> >>
> >> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
> >> ---
> >>  dog/cluster.c            | 9 ++++++++-
> >>  include/internal_proto.h | 1 +
> >>  sheep/ops.c              | 4 +++-
> >>  sheep/vdi.c              | 2 +-
> >>  4 files changed, 13 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/dog/cluster.c b/dog/cluster.c
> >> index e252316..bb61fcc 100644
> >> --- a/dog/cluster.c
> >> +++ b/dog/cluster.c
> >> @@ -25,6 +25,7 @@ static struct sd_option cluster_options[] = {
> >>       {'l', "lock", false, "Lock vdi to exclude multiple users"},
> >>       {'m', "multithread", false,
> >>        "use multi-thread for 'cluster snapshot save'"},
> >> +     {'R', "recyclevid", false, "enable recycling of VID"},
> >>       {'t', "strict", false,
> >>        "do not serve write request if number of nodes is not sufficient"},
> >>       {'z', "block_size_shift", true, "specify the shift num of default"
> >> @@ -43,6 +44,7 @@ static struct cluster_cmd_data {
> >>       char name[STORE_LEN];
> >>       bool fixed_vnodes;
> >>       bool use_lock;
> >> +     bool recycle_vid;
> >>  } cluster_cmd_data;
> >>
> >>  #define DEFAULT_STORE        "plain"
> >> @@ -169,6 +171,8 @@ static int cluster_format(int argc, char **argv)
> >>               hdr.cluster.flags |= SD_CLUSTER_FLAG_STRICT;
> >>       if (cluster_cmd_data.use_lock)
> >>               hdr.cluster.flags |= SD_CLUSTER_FLAG_USE_LOCK;
> >> +     if (cluster_cmd_data.recycle_vid)
> >> +             hdr.cluster.flags |= SD_CLUSTER_FLAG_RECYCLE_VID;
> >>
> >>  #ifdef HAVE_DISKVNODES
> >>       hdr.cluster.flags |= SD_CLUSTER_FLAG_DISKMODE;
> >> @@ -820,7 +824,7 @@ failure:
> >>  static struct subcommand cluster_cmd[] = {
> >>       {"info", NULL, "aprhvT", "show cluster information",
> >>        NULL, CMD_NEED_NODELIST, cluster_info, cluster_options},
> >> -     {"format", NULL, "bcltaphzTV", "create a Sheepdog store",
> >> +     {"format", NULL, "bcltaphzTVR", "create a Sheepdog store",
> >>        NULL, CMD_NEED_NODELIST, cluster_format, cluster_options},
> >>       {"shutdown", NULL, "aphT", "stop Sheepdog",
> >>        NULL, 0, cluster_shutdown, cluster_options},
> >> @@ -890,6 +894,9 @@ static int cluster_parser(int ch, const char *opt)
> >>       case 'V':
> >>               cluster_cmd_data.fixed_vnodes = true;
> >>               break;
> >> +     case 'R':
> >> +             cluster_cmd_data.recycle_vid = true;
> >> +             break;
> >>       }
> >>
> >>       return 0;
> >> diff --git a/include/internal_proto.h b/include/internal_proto.h
> >> index defbe6d..3a370b6 100644
> >> --- a/include/internal_proto.h
> >> +++ b/include/internal_proto.h
> >> @@ -157,6 +157,7 @@
> >>  #define SD_CLUSTER_FLAG_DISKMODE     0x0002 /* Disk mode for cluster */
> >>  #define SD_CLUSTER_FLAG_AUTO_VNODES  0x0004 /* Cluster vnodes strategy */
> >>  #define SD_CLUSTER_FLAG_USE_LOCK     0x0008 /* Lock/Unlock vdi */
> >> +#define SD_CLUSTER_FLAG_RECYCLE_VID  0x0010 /* Enable recycling of VID */
> >>
> >>
> >>  enum sd_status {
> >> diff --git a/sheep/ops.c b/sheep/ops.c
> >> index 0e5ac64..8c79a84 100644
> >> --- a/sheep/ops.c
> >> +++ b/sheep/ops.c
> >> @@ -198,7 +198,9 @@ static int post_cluster_del_vdi(const struct sd_req *req, struct sd_rsp *rsp,
> >>       if (ret == SD_RES_SUCCESS) {
> >>               atomic_set_bit(vid, sys->vdi_deleted);
> >>               vdi_mark_deleted(vid);
> >> -             run_vid_gc(vid);
> >> +
> >> +             if (sys->cinfo.flags & SD_CLUSTER_FLAG_RECYCLE_VID)
> >> +                     run_vid_gc(vid);
> >>       }
> >>
> >>       if (!sys->enable_object_cache)
> >> diff --git a/sheep/vdi.c b/sheep/vdi.c
> >> index 8114fb5..eac0367 100644
> >> --- a/sheep/vdi.c
> >> +++ b/sheep/vdi.c
> >> @@ -384,7 +384,7 @@ static int do_add_vdi_state(uint32_t vid, int nr_copies, bool snapshot,
> >>               already_exists = true;
> >>       }
> >>
> >> -     if (!already_exists)
> >> +     if (sys->cinfo.flags & SD_CLUSTER_FLAG_RECYCLE_VID && !already_exists)
> >>               update_vdi_family(parent_vid, entry, unordered);
> >>
> >>       sd_rw_unlock(&vdi_state_lock);
> >> --
> >> 1.9.1
> >>
> >> --
> >> sheepdog mailing list
> >> sheepdog at lists.wpkg.org
> >> https://lists.wpkg.org/mailman/listinfo/sheepdog
> >
> > It seems that your patch 21549a1bd4981fabcc09d062a647162127fe0637
> >
> > Author: Hitoshi Mitake <mitake.hitoshi at gmail.com>
> > Date:   Sun Jun 1 23:23:18 2014 +0900
> >
> >     sheep: don't recycle VDI ID
> >
> >     Recycling VDI IDs of deleted VDIs is a completely wrong idea. It
> >     breaks relations between inode objects and data objects. For example,
> >     it can cause a problem of corrupting cloned VDIs (see related
> >     issue). This patch forbids the recycling.
> >
> >     Related issue:
> >     https://bugs.launchpad.net/sheepdog-project/+bug/1317755
> >
> > conflict with your current vid recycling. I think it is not hard to allow
> > vid recycle for snapshot create, I'd give it a try.
> >
> 
> No, they don't conflict. The old policy of recycling has a possibility
> of data corruption (the issue of launchpad describes). And new one
> avoid the problem because it recycle VIDs with considering VDI family
> relations.

How about make 'dog vdi clone --no-share' as the default clone operation? And
we can add dog vdi clone --share to keep old behavior as optional. By this
manner, --no-share will save us from this kind of subtle problem. And your team
problem about vdi exhaustion will be achieved :).

This manner is not perfect, but it will benefit us:

1. stable code base since old algorithm is long tested.
2. we won't degrate vdi_lookup

Thanks,
Yuan



More information about the sheepdog mailing list