[Sheepdog] [PATCH v5] remove oids from object list cache when deleting a vdi

levin li levin108 at gmail.com
Sat May 5 08:22:37 CEST 2012


On 05/05/2012 02:08 PM, levin wrote:
>
>
> On Saturday, May 5, 2012, Liu Yuan wrote:
>
>     On 05/04/2012 02:49 PM, levin li wrote:
>
>     > Cluster recovery may cause objects migrated from one node
>     > to another, but the object list cache doesn't change, when
>     > deleting an object we can not find the right node in whose
>     > cache the id stays, so we need to notify the deletion object
>     > list to all the node to make them delete the specified
>     > object id from object list cache.
>     >
>     > I reused the deletion_work_list and deletion_wqueue for the
>     > object list cleanup work, so that we save some memory by not
>     > starting a new work queue.
>     >
>     > The del_vdi is called in worker thread, but it's locked by the
>     > notification mechanism, other operations about deletion_work_list
>     > and deletion_wqueue are done in main thread, so there's no need
>     > to concern about the race condition.
>     >
>     > Signed-off-by: levin li <xingke.lwp at taobao.com>
>     > ---
>     >  include/sheep.h           |    1 +
>     >  sheep/group.c             |    1 +
>     >  sheep/object_list_cache.c |   17 +++++++++
>     >  sheep/ops.c               |   57 ++++++++++++++++++++++++++++++
>     >  sheep/sheep_priv.h        |   21 ++++++++++++
>     >  sheep/vdi.c               |   84
>     ++++++++++++++++++++++++++++++---------------
>     >  6 files changed, 154 insertions(+), 27 deletions(-)
>     >
>     > diff --git a/include/sheep.h b/include/sheep.h
>     > index 7e287c4..e941dc1 100644
>     > --- a/include/sheep.h
>     > +++ b/include/sheep.h
>     > @@ -46,6 +46,7 @@
>     >  #define SD_OP_TRACE          0x95
>     >  #define SD_OP_TRACE_CAT      0x96
>     >  #define SD_OP_STAT_RECOVERY  0x97
>     > +#define SD_OP_NOTIFY_VDI_DEL 0x98
>     >
>     >  #define SD_FLAG_CMD_IO_LOCAL   0x0010
>     >  #define SD_FLAG_CMD_RECOVERY 0x0020
>     > diff --git a/sheep/group.c b/sheep/group.c
>     > index c7fd387..1f1aa8d 100644
>     > --- a/sheep/group.c
>     > +++ b/sheep/group.c
>     > @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone,
>     int nr_vnodes)
>     >       INIT_LIST_HEAD(&sys->req_wait_for_obj_list);
>     >       INIT_LIST_HEAD(&sys->consistent_obj_list);
>     >       INIT_LIST_HEAD(&sys->blocking_conn_list);
>     > +     INIT_LIST_HEAD(&sys->deletion_work_list);
>     >
>     >       INIT_LIST_HEAD(&sys->request_queue);
>     >       INIT_LIST_HEAD(&sys->event_queue);
>     > diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c
>     > index 28cdbbc..0aa93e5 100644
>     > --- a/sheep/object_list_cache.c
>     > +++ b/sheep/object_list_cache.c
>     > @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req
>     *hdr, struct sd_list_rsp *rsp, void *d
>     >
>     >       return res;
>     >  }
>     > +
>     > +int del_vdi_from_objlist_cache(uint64_t *objs, int count)
>     > +{
>     > +     int i;
>     > +
>     > +     dprintf("%d\n", count);
>     > +     for (i = 0; i < count; i++) {
>     > +             pthread_rwlock_wrlock(&obj_list_cache.lock);
>     > +             if (!objlist_cache_rb_remove(&obj_list_cache.root,
>     objs[i])) {
>     > +                     dprintf("remove oid %" PRIx64 " from
>     objlist cache\n", objs[i]);
>     > +                     obj_list_cache.cache_size--;
>     > +             }
>     > +             pthread_rwlock_unlock(&obj_list_cache.lock);
>     > +     }
>     > +
>     > +     return 0;
>     > +}
>     > diff --git a/sheep/ops.c b/sheep/ops.c
>     > index b6f8eb2..5763b6f 100644
>     > --- a/sheep/ops.c
>     > +++ b/sheep/ops.c
>     > @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct
>     sd_req *req, struct sd_rsp *rsp,
>     >       return ret;
>     >  }
>     >
>     > +static void objlist_cache_del_vdi_work(struct work *work)
>     > +{
>     > +     struct deletion_work *dw = container_of(work, struct
>     deletion_work, work);
>     > +
>     > +     del_vdi_from_obThis name looks confusing, I guess you mean
>     undeleted objects as in
>     dw->deleted_objs, dw->deleted_count.
>
> No, here I do mean deleted objects indeed, by this buffer I record 
> which objects are deleted,
> and then we can notify this list to the other nodes.
>
>     > +
>     >       if (dw->delete_error) {
>     >               write_object(dw->vnodes, dw->epoch,
>     vid_to_vdi_oid(vdi_id),
>     >                            (void *)inode, sizeof(*inode), 0, 0,
>     nr_copies, 0);
>     > @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work)
>     >       free(dw->buf);
>     >       free(dw);
>     >
>     > -     if (!list_empty(&deletion_work_list)) {
>     > -             dw = list_first_entry(&deletion_work_list,
>     > +     if (!list_empty(&sys->deletion_work_list)) {
>     > +             dw = list_first_entry(&sys->deletion_work_list,
>     >                                     struct deletion_work,
>     dw_siblings);
>     >
>     >               queue_work(sys->deletion_wqueue, &dw->work);
>     > @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t
>     epoch)
>     >       if (dw->count == 0)
>     >               goto out;
>     >
>     > -     if (!list_empty(&deletion_work_list)) {
>     > -             list_add_tail(&dw->dw_siblings, &deletion_work_list);
>     > +     if (!list_empty(&sys->deletion_work_list)) {
>     > +             list_add_tail(&dw->dw_siblings,
>     &sys->deletion_work_list);
>     >               goto out;
>     >       }
>     >
>     > -     list_add_tail(&dw->dw_siblings, &deletion_work_list);
>     > +     list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);
>     >       queue_work(sys->deletion_wqueue, &dw->work);
>
>
>     it looks redundant to have deletion_work_list here to get FIFO
>     semantics.
>
>     We only have one worker for deletion_wqueue, isn't it FIFO already?
>
>
> Maybe kazutaka can explain this.
>
>     >  out:
>     >       return SD_RES_SUCCESS;
>
>
>     This code looks really confusing, I think you need rework the logic
>     first, take object recovery as an example, recovery_object just
>     recovers
>     the targeted object once at a time. Your delete_one I guess serves the
>     same purpose as its name suggests, but you actually try to delete all
>     the objects in one go.
>
>
> It's something different from recovery_object, in recovery_object it 
> means recovering
> a single object, but here delete_one means deleting a single VDI, not 
> a single object,
> as we see, maybe there're more than one VDI needs to be deleted in one 
> deletion work,
> so what delete_one does is to delete on of the VDIs.
>
>     By the way, please run script/checkpatch.pl
>     <http://checkpatch.pl>, there are some style problems.
>
> OK, I will check my latest patch with this script, maybe this version 
> is dropped, as I just submitted
> the 5th version yesterday.
>
Sorry, I was confused by gmail, this is my 5th version.

thanks,

levin

> thanks,
>
> levin
>
>     Thanks,
>     Yuan
>
>
>
> -- 
> levin
> basiccoder.com <http://basiccoder.com>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120505/5e9c069c/attachment.html>


More information about the sheepdog mailing list