[Sheepdog] [PATCH v5] remove oids from object list cache when deleting a vdi
levin li
levin108 at gmail.com
Sat May 5 08:22:37 CEST 2012
On 05/05/2012 02:08 PM, levin wrote:
>
>
> On Saturday, May 5, 2012, Liu Yuan wrote:
>
> On 05/04/2012 02:49 PM, levin li wrote:
>
> > Cluster recovery may cause objects migrated from one node
> > to another, but the object list cache doesn't change, when
> > deleting an object we can not find the right node in whose
> > cache the id stays, so we need to notify the deletion object
> > list to all the node to make them delete the specified
> > object id from object list cache.
> >
> > I reused the deletion_work_list and deletion_wqueue for the
> > object list cleanup work, so that we save some memory by not
> > starting a new work queue.
> >
> > The del_vdi is called in worker thread, but it's locked by the
> > notification mechanism, other operations about deletion_work_list
> > and deletion_wqueue are done in main thread, so there's no need
> > to concern about the race condition.
> >
> > Signed-off-by: levin li <xingke.lwp at taobao.com>
> > ---
> > include/sheep.h | 1 +
> > sheep/group.c | 1 +
> > sheep/object_list_cache.c | 17 +++++++++
> > sheep/ops.c | 57 ++++++++++++++++++++++++++++++
> > sheep/sheep_priv.h | 21 ++++++++++++
> > sheep/vdi.c | 84
> ++++++++++++++++++++++++++++++---------------
> > 6 files changed, 154 insertions(+), 27 deletions(-)
> >
> > diff --git a/include/sheep.h b/include/sheep.h
> > index 7e287c4..e941dc1 100644
> > --- a/include/sheep.h
> > +++ b/include/sheep.h
> > @@ -46,6 +46,7 @@
> > #define SD_OP_TRACE 0x95
> > #define SD_OP_TRACE_CAT 0x96
> > #define SD_OP_STAT_RECOVERY 0x97
> > +#define SD_OP_NOTIFY_VDI_DEL 0x98
> >
> > #define SD_FLAG_CMD_IO_LOCAL 0x0010
> > #define SD_FLAG_CMD_RECOVERY 0x0020
> > diff --git a/sheep/group.c b/sheep/group.c
> > index c7fd387..1f1aa8d 100644
> > --- a/sheep/group.c
> > +++ b/sheep/group.c
> > @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone,
> int nr_vnodes)
> > INIT_LIST_HEAD(&sys->req_wait_for_obj_list);
> > INIT_LIST_HEAD(&sys->consistent_obj_list);
> > INIT_LIST_HEAD(&sys->blocking_conn_list);
> > + INIT_LIST_HEAD(&sys->deletion_work_list);
> >
> > INIT_LIST_HEAD(&sys->request_queue);
> > INIT_LIST_HEAD(&sys->event_queue);
> > diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c
> > index 28cdbbc..0aa93e5 100644
> > --- a/sheep/object_list_cache.c
> > +++ b/sheep/object_list_cache.c
> > @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req
> *hdr, struct sd_list_rsp *rsp, void *d
> >
> > return res;
> > }
> > +
> > +int del_vdi_from_objlist_cache(uint64_t *objs, int count)
> > +{
> > + int i;
> > +
> > + dprintf("%d\n", count);
> > + for (i = 0; i < count; i++) {
> > + pthread_rwlock_wrlock(&obj_list_cache.lock);
> > + if (!objlist_cache_rb_remove(&obj_list_cache.root,
> objs[i])) {
> > + dprintf("remove oid %" PRIx64 " from
> objlist cache\n", objs[i]);
> > + obj_list_cache.cache_size--;
> > + }
> > + pthread_rwlock_unlock(&obj_list_cache.lock);
> > + }
> > +
> > + return 0;
> > +}
> > diff --git a/sheep/ops.c b/sheep/ops.c
> > index b6f8eb2..5763b6f 100644
> > --- a/sheep/ops.c
> > +++ b/sheep/ops.c
> > @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct
> sd_req *req, struct sd_rsp *rsp,
> > return ret;
> > }
> >
> > +static void objlist_cache_del_vdi_work(struct work *work)
> > +{
> > + struct deletion_work *dw = container_of(work, struct
> deletion_work, work);
> > +
> > + del_vdi_from_obThis name looks confusing, I guess you mean
> undeleted objects as in
> dw->deleted_objs, dw->deleted_count.
>
> No, here I do mean deleted objects indeed, by this buffer I record
> which objects are deleted,
> and then we can notify this list to the other nodes.
>
> > +
> > if (dw->delete_error) {
> > write_object(dw->vnodes, dw->epoch,
> vid_to_vdi_oid(vdi_id),
> > (void *)inode, sizeof(*inode), 0, 0,
> nr_copies, 0);
> > @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work)
> > free(dw->buf);
> > free(dw);
> >
> > - if (!list_empty(&deletion_work_list)) {
> > - dw = list_first_entry(&deletion_work_list,
> > + if (!list_empty(&sys->deletion_work_list)) {
> > + dw = list_first_entry(&sys->deletion_work_list,
> > struct deletion_work,
> dw_siblings);
> >
> > queue_work(sys->deletion_wqueue, &dw->work);
> > @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t
> epoch)
> > if (dw->count == 0)
> > goto out;
> >
> > - if (!list_empty(&deletion_work_list)) {
> > - list_add_tail(&dw->dw_siblings, &deletion_work_list);
> > + if (!list_empty(&sys->deletion_work_list)) {
> > + list_add_tail(&dw->dw_siblings,
> &sys->deletion_work_list);
> > goto out;
> > }
> >
> > - list_add_tail(&dw->dw_siblings, &deletion_work_list);
> > + list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);
> > queue_work(sys->deletion_wqueue, &dw->work);
>
>
> it looks redundant to have deletion_work_list here to get FIFO
> semantics.
>
> We only have one worker for deletion_wqueue, isn't it FIFO already?
>
>
> Maybe kazutaka can explain this.
>
> > out:
> > return SD_RES_SUCCESS;
>
>
> This code looks really confusing, I think you need rework the logic
> first, take object recovery as an example, recovery_object just
> recovers
> the targeted object once at a time. Your delete_one I guess serves the
> same purpose as its name suggests, but you actually try to delete all
> the objects in one go.
>
>
> It's something different from recovery_object, in recovery_object it
> means recovering
> a single object, but here delete_one means deleting a single VDI, not
> a single object,
> as we see, maybe there're more than one VDI needs to be deleted in one
> deletion work,
> so what delete_one does is to delete on of the VDIs.
>
> By the way, please run script/checkpatch.pl
> <http://checkpatch.pl>, there are some style problems.
>
> OK, I will check my latest patch with this script, maybe this version
> is dropped, as I just submitted
> the 5th version yesterday.
>
Sorry, I was confused by gmail, this is my 5th version.
thanks,
levin
> thanks,
>
> levin
>
> Thanks,
> Yuan
>
>
>
> --
> levin
> basiccoder.com <http://basiccoder.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120505/5e9c069c/attachment-0003.html>
More information about the sheepdog
mailing list