[Sheepdog] [PATCH v4] remove oids from object list cache when deleting a vdi

Sat May 5 08:08:21 CEST 2012

On Saturday, May 5, 2012, Liu Yuan wrote:

> On 05/04/2012 02:49 PM, levin li wrote:
>
> > Cluster recovery may cause objects migrated from one node
> > to another, but the object list cache doesn't change, when
> > deleting an object we can not find the right node in whose
> > cache the id stays, so we need to notify the deletion object
> > list to all the node to make them delete the specified
> > object id from object list cache.
> >
> > I reused the deletion_work_list and deletion_wqueue for the
> > object list cleanup work, so that we save some memory by not
> > starting a new work queue.
> >
> > The del_vdi is called in worker thread, but it's locked by the
> > notification mechanism, other operations about deletion_work_list
> > and deletion_wqueue are done in main thread, so there's no need
> > to concern about the race condition.
> >
> > Signed-off-by: levin li <xingke.lwp at taobao.com>
> > ---
> >  include/sheep.h           |    1 +
> >  sheep/group.c             |    1 +
> >  sheep/object_list_cache.c |   17 +++++++++
> >  sheep/ops.c               |   57 ++++++++++++++++++++++++++++++
> >  sheep/sheep_priv.h        |   21 ++++++++++++
> >  sheep/vdi.c               |   84
> ++++++++++++++++++++++++++++++---------------
> >  6 files changed, 154 insertions(+), 27 deletions(-)
> >
> > diff --git a/include/sheep.h b/include/sheep.h
> > index 7e287c4..e941dc1 100644
> > --- a/include/sheep.h
> > +++ b/include/sheep.h
> > @@ -46,6 +46,7 @@
> >  #define SD_OP_TRACE          0x95
> >  #define SD_OP_TRACE_CAT      0x96
> >  #define SD_OP_STAT_RECOVERY  0x97
> > +#define SD_OP_NOTIFY_VDI_DEL 0x98
> >
> >  #define SD_FLAG_CMD_IO_LOCAL   0x0010
> >  #define SD_FLAG_CMD_RECOVERY 0x0020
> > diff --git a/sheep/group.c b/sheep/group.c
> > index c7fd387..1f1aa8d 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone, int
> nr_vnodes)
> >       INIT_LIST_HEAD(&sys->req_wait_for_obj_list);
> >       INIT_LIST_HEAD(&sys->consistent_obj_list);
> >       INIT_LIST_HEAD(&sys->blocking_conn_list);
> > +     INIT_LIST_HEAD(&sys->deletion_work_list);
> >
> >       INIT_LIST_HEAD(&sys->request_queue);
> >       INIT_LIST_HEAD(&sys->event_queue);
> > diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c
> > index 28cdbbc..0aa93e5 100644
> > --- a/sheep/object_list_cache.c
> > +++ b/sheep/object_list_cache.c
> > @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req *hdr,
> struct sd_list_rsp *rsp, void *d
> >
> >       return res;
> >  }
> > +
> > +int del_vdi_from_objlist_cache(uint64_t *objs, int count)
> > +{
> > +     int i;
> > +
> > +     dprintf("%d\n", count);
> > +     for (i = 0; i < count; i++) {
> > +             pthread_rwlock_wrlock(&obj_list_cache.lock);
> > +             if (!objlist_cache_rb_remove(&obj_list_cache.root,
> objs[i])) {
> > +                     dprintf("remove oid %" PRIx64 " from objlist
> cache\n", objs[i]);
> > +                     obj_list_cache.cache_size--;
> > +             }
> > +             pthread_rwlock_unlock(&obj_list_cache.lock);
> > +     }
> > +
> > +     return 0;
> > +}
> > diff --git a/sheep/ops.c b/sheep/ops.c
> > index b6f8eb2..5763b6f 100644
> > --- a/sheep/ops.c
> > +++ b/sheep/ops.c
> > @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct sd_req
> *req, struct sd_rsp *rsp,
> >       return ret;
> >  }
> >
> > +static void objlist_cache_del_vdi_work(struct work *work)
> > +{
> > +     struct deletion_work *dw = container_of(work, struct
> deletion_work, work);
> > +
> > +     del_vdi_from_obThis name looks confusing, I guess you mean
> undeleted objects as in
> dw->deleted_objs, dw->deleted_count.
>
> No, here I do mean deleted objects indeed, by this buffer I record which
objects are deleted,
and then we can notify this list to the other nodes.

> > +
> >       if (dw->delete_error) {
> >               write_object(dw->vnodes, dw->epoch, vid_to_vdi_oid(vdi_id),
> >                            (void *)inode, sizeof(*inode), 0, 0,
> nr_copies, 0);
> > @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work)
> >       free(dw->buf);
> >       free(dw);
> >
> > -     if (!list_empty(&deletion_work_list)) {
> > -             dw = list_first_entry(&deletion_work_list,
> > +     if (!list_empty(&sys->deletion_work_list)) {
> > +             dw = list_first_entry(&sys->deletion_work_list,
> >                                     struct deletion_work, dw_siblings);
> >
> >               queue_work(sys->deletion_wqueue, &dw->work);
> > @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t epoch)
> >       if (dw->count == 0)
> >               goto out;
> >
> > -     if (!list_empty(&deletion_work_list)) {
> > -             list_add_tail(&dw->dw_siblings, &deletion_work_list);
> > +     if (!list_empty(&sys->deletion_work_list)) {
> > +             list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);
> >               goto out;
> >       }
> >
> > -     list_add_tail(&dw->dw_siblings, &deletion_work_list);
> > +     list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);
> >       queue_work(sys->deletion_wqueue, &dw->work);
>
>
> it looks redundant to have deletion_work_list here to get FIFO semantics.
>
> We only have one worker for deletion_wqueue, isn't it FIFO already?
>
>
Maybe kazutaka can explain this.

>  out:
> >       return SD_RES_SUCCESS;
>
>
> This code looks really confusing, I think you need rework the logic
> first, take object recovery as an example, recovery_object just recovers
> the targeted object once at a time. Your delete_one I guess serves the
> same purpose as its name suggests, but you actually try to delete all
> the objects in one go.
>
>
It's something different from recovery_object, in recovery_object it means
recovering
a single object, but here delete_one means deleting a single VDI, not a
single object,
as we see, maybe there're more than one VDI needs to be deleted in one
deletion work,
so what delete_one does is to delete on of the VDIs.

> By the way, please run script/checkpatch.pl, there are some style
> problems.
>
> OK, I will check my latest patch with this script, maybe this version is
dropped, as I just submitted
the 5th version yesterday.

thanks,

levin

> Thanks,
> Yuan
>

-- 
levin
basiccoder.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120505/f0638933/attachment.html>