On Saturday, May 5, 2012, Liu Yuan wrote: > On 05/04/2012 02:49 PM, levin li wrote: > > > Cluster recovery may cause objects migrated from one node > > to another, but the object list cache doesn't change, when > > deleting an object we can not find the right node in whose > > cache the id stays, so we need to notify the deletion object > > list to all the node to make them delete the specified > > object id from object list cache. > > > > I reused the deletion_work_list and deletion_wqueue for the > > object list cleanup work, so that we save some memory by not > > starting a new work queue. > > > > The del_vdi is called in worker thread, but it's locked by the > > notification mechanism, other operations about deletion_work_list > > and deletion_wqueue are done in main thread, so there's no need > > to concern about the race condition. > > > > Signed-off-by: levin li <xingke.lwp at taobao.com> > > --- > > include/sheep.h | 1 + > > sheep/group.c | 1 + > > sheep/object_list_cache.c | 17 +++++++++ > > sheep/ops.c | 57 ++++++++++++++++++++++++++++++ > > sheep/sheep_priv.h | 21 ++++++++++++ > > sheep/vdi.c | 84 > ++++++++++++++++++++++++++++++--------------- > > 6 files changed, 154 insertions(+), 27 deletions(-) > > > > diff --git a/include/sheep.h b/include/sheep.h > > index 7e287c4..e941dc1 100644 > > --- a/include/sheep.h > > +++ b/include/sheep.h > > @@ -46,6 +46,7 @@ > > #define SD_OP_TRACE 0x95 > > #define SD_OP_TRACE_CAT 0x96 > > #define SD_OP_STAT_RECOVERY 0x97 > > +#define SD_OP_NOTIFY_VDI_DEL 0x98 > > > > #define SD_FLAG_CMD_IO_LOCAL 0x0010 > > #define SD_FLAG_CMD_RECOVERY 0x0020 > > diff --git a/sheep/group.c b/sheep/group.c > > index c7fd387..1f1aa8d 100644 > > --- a/sheep/group.c > > +++ b/sheep/group.c > > @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone, int > nr_vnodes) > > INIT_LIST_HEAD(&sys->req_wait_for_obj_list); > > INIT_LIST_HEAD(&sys->consistent_obj_list); > > INIT_LIST_HEAD(&sys->blocking_conn_list); > > + INIT_LIST_HEAD(&sys->deletion_work_list); > > > > INIT_LIST_HEAD(&sys->request_queue); > > INIT_LIST_HEAD(&sys->event_queue); > > diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c > > index 28cdbbc..0aa93e5 100644 > > --- a/sheep/object_list_cache.c > > +++ b/sheep/object_list_cache.c > > @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req *hdr, > struct sd_list_rsp *rsp, void *d > > > > return res; > > } > > + > > +int del_vdi_from_objlist_cache(uint64_t *objs, int count) > > +{ > > + int i; > > + > > + dprintf("%d\n", count); > > + for (i = 0; i < count; i++) { > > + pthread_rwlock_wrlock(&obj_list_cache.lock); > > + if (!objlist_cache_rb_remove(&obj_list_cache.root, > objs[i])) { > > + dprintf("remove oid %" PRIx64 " from objlist > cache\n", objs[i]); > > + obj_list_cache.cache_size--; > > + } > > + pthread_rwlock_unlock(&obj_list_cache.lock); > > + } > > + > > + return 0; > > +} > > diff --git a/sheep/ops.c b/sheep/ops.c > > index b6f8eb2..5763b6f 100644 > > --- a/sheep/ops.c > > +++ b/sheep/ops.c > > @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct sd_req > *req, struct sd_rsp *rsp, > > return ret; > > } > > > > +static void objlist_cache_del_vdi_work(struct work *work) > > +{ > > + struct deletion_work *dw = container_of(work, struct > deletion_work, work); > > + > > + del_vdi_from_obThis name looks confusing, I guess you mean > undeleted objects as in > dw->deleted_objs, dw->deleted_count. > > No, here I do mean deleted objects indeed, by this buffer I record which objects are deleted, and then we can notify this list to the other nodes. > > + > > if (dw->delete_error) { > > write_object(dw->vnodes, dw->epoch, vid_to_vdi_oid(vdi_id), > > (void *)inode, sizeof(*inode), 0, 0, > nr_copies, 0); > > @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work) > > free(dw->buf); > > free(dw); > > > > - if (!list_empty(&deletion_work_list)) { > > - dw = list_first_entry(&deletion_work_list, > > + if (!list_empty(&sys->deletion_work_list)) { > > + dw = list_first_entry(&sys->deletion_work_list, > > struct deletion_work, dw_siblings); > > > > queue_work(sys->deletion_wqueue, &dw->work); > > @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t epoch) > > if (dw->count == 0) > > goto out; > > > > - if (!list_empty(&deletion_work_list)) { > > - list_add_tail(&dw->dw_siblings, &deletion_work_list); > > + if (!list_empty(&sys->deletion_work_list)) { > > + list_add_tail(&dw->dw_siblings, &sys->deletion_work_list); > > goto out; > > } > > > > - list_add_tail(&dw->dw_siblings, &deletion_work_list); > > + list_add_tail(&dw->dw_siblings, &sys->deletion_work_list); > > queue_work(sys->deletion_wqueue, &dw->work); > > > it looks redundant to have deletion_work_list here to get FIFO semantics. > > We only have one worker for deletion_wqueue, isn't it FIFO already? > > Maybe kazutaka can explain this. > out: > > return SD_RES_SUCCESS; > > > This code looks really confusing, I think you need rework the logic > first, take object recovery as an example, recovery_object just recovers > the targeted object once at a time. Your delete_one I guess serves the > same purpose as its name suggests, but you actually try to delete all > the objects in one go. > > It's something different from recovery_object, in recovery_object it means recovering a single object, but here delete_one means deleting a single VDI, not a single object, as we see, maybe there're more than one VDI needs to be deleted in one deletion work, so what delete_one does is to delete on of the VDIs. > By the way, please run script/checkpatch.pl, there are some style > problems. > > OK, I will check my latest patch with this script, maybe this version is dropped, as I just submitted the 5th version yesterday. thanks, levin > Thanks, > Yuan > -- levin basiccoder.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120505/f0638933/attachment.html> |