<br><br>On Saturday, May 5, 2012, Liu Yuan wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 05/04/2012 02:49 PM, levin li wrote:<br>
<br>
> Cluster recovery may cause objects migrated from one node<br>
> to another, but the object list cache doesn't change, when<br>
> deleting an object we can not find the right node in whose<br>
> cache the id stays, so we need to notify the deletion object<br>
> list to all the node to make them delete the specified<br>
> object id from object list cache.<br>
><br>
> I reused the deletion_work_list and deletion_wqueue for the<br>
> object list cleanup work, so that we save some memory by not<br>
> starting a new work queue.<br>
><br>
> The del_vdi is called in worker thread, but it's locked by the<br>
> notification mechanism, other operations about deletion_work_list<br>
> and deletion_wqueue are done in main thread, so there's no need<br>
> to concern about the race condition.<br>
><br>
> Signed-off-by: levin li <<a>xingke.lwp@taobao.com</a>><br>
> ---<br>
> include/sheep.h | 1 +<br>
> sheep/group.c | 1 +<br>
> sheep/object_list_cache.c | 17 +++++++++<br>
> sheep/ops.c | 57 ++++++++++++++++++++++++++++++<br>
> sheep/sheep_priv.h | 21 ++++++++++++<br>
> sheep/vdi.c | 84 ++++++++++++++++++++++++++++++---------------<br>
> 6 files changed, 154 insertions(+), 27 deletions(-)<br>
><br>
> diff --git a/include/sheep.h b/include/sheep.h<br>
> index 7e287c4..e941dc1 100644<br>
> --- a/include/sheep.h<br>
> +++ b/include/sheep.h<br>
> @@ -46,6 +46,7 @@<br>
> #define SD_OP_TRACE 0x95<br>
> #define SD_OP_TRACE_CAT 0x96<br>
> #define SD_OP_STAT_RECOVERY 0x97<br>
> +#define SD_OP_NOTIFY_VDI_DEL 0x98<br>
><br>
> #define SD_FLAG_CMD_IO_LOCAL 0x0010<br>
> #define SD_FLAG_CMD_RECOVERY 0x0020<br>
> diff --git a/sheep/group.c b/sheep/group.c<br>
> index c7fd387..1f1aa8d 100644<br>
> --- a/sheep/group.c<br>
> +++ b/sheep/group.c<br>
> @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone, int nr_vnodes)<br>
> INIT_LIST_HEAD(&sys->req_wait_for_obj_list);<br>
> INIT_LIST_HEAD(&sys->consistent_obj_list);<br>
> INIT_LIST_HEAD(&sys->blocking_conn_list);<br>
> + INIT_LIST_HEAD(&sys->deletion_work_list);<br>
><br>
> INIT_LIST_HEAD(&sys->request_queue);<br>
> INIT_LIST_HEAD(&sys->event_queue);<br>
> diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c<br>
> index 28cdbbc..0aa93e5 100644<br>
> --- a/sheep/object_list_cache.c<br>
> +++ b/sheep/object_list_cache.c<br>
> @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req *hdr, struct sd_list_rsp *rsp, void *d<br>
><br>
> return res;<br>
> }<br>
> +<br>
> +int del_vdi_from_objlist_cache(uint64_t *objs, int count)<br>
> +{<br>
> + int i;<br>
> +<br>
> + dprintf("%d\n", count);<br>
> + for (i = 0; i < count; i++) {<br>
> + pthread_rwlock_wrlock(&obj_list_cache.lock);<br>
> + if (!objlist_cache_rb_remove(&obj_list_cache.root, objs[i])) {<br>
> + dprintf("remove oid %" PRIx64 " from objlist cache\n", objs[i]);<br>
> + obj_list_cache.cache_size--;<br>
> + }<br>
> + pthread_rwlock_unlock(&obj_list_cache.lock);<br>
> + }<br>
> +<br>
> + return 0;<br>
> +}<br>
> diff --git a/sheep/ops.c b/sheep/ops.c<br>
> index b6f8eb2..5763b6f 100644<br>
> --- a/sheep/ops.c<br>
> +++ b/sheep/ops.c<br>
> @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct sd_req *req, struct sd_rsp *rsp,<br>
> return ret;<br>
> }<br>
><br>
> +static void objlist_cache_del_vdi_work(struct work *work)<br>
> +{<br>
> + struct deletion_work *dw = container_of(work, struct deletion_work, work);<br>
> +<br>
> + del_vdi_from_obThis name looks confusing, I guess you mean undeleted objects as in<br>
dw->deleted_objs, dw->deleted_count.<br>
<br></blockquote><div>No, here I do mean deleted objects indeed, by this buffer I record which objects are deleted,</div><div>and then we can notify this list to the other nodes.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> +<br>
> if (dw->delete_error) {<br>
> write_object(dw->vnodes, dw->epoch, vid_to_vdi_oid(vdi_id),<br>
> (void *)inode, sizeof(*inode), 0, 0, nr_copies, 0);<br>
> @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work)<br>
> free(dw->buf);<br>
> free(dw);<br>
><br>
> - if (!list_empty(&deletion_work_list)) {<br>
> - dw = list_first_entry(&deletion_work_list,<br>
> + if (!list_empty(&sys->deletion_work_list)) {<br>
> + dw = list_first_entry(&sys->deletion_work_list,<br>
> struct deletion_work, dw_siblings);<br>
><br>
> queue_work(sys->deletion_wqueue, &dw->work);<br>
> @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t epoch)<br>
> if (dw->count == 0)<br>
> goto out;<br>
><br>
> - if (!list_empty(&deletion_work_list)) {<br>
> - list_add_tail(&dw->dw_siblings, &deletion_work_list);<br>
> + if (!list_empty(&sys->deletion_work_list)) {<br>
> + list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);<br>
> goto out;<br>
> }<br>
><br>
> - list_add_tail(&dw->dw_siblings, &deletion_work_list);<br>
> + list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);<br>
> queue_work(sys->deletion_wqueue, &dw->work);<br>
<br>
<br>
it looks redundant to have deletion_work_list here to get FIFO semantics.<br>
<br>
We only have one worker for deletion_wqueue, isn't it FIFO already?<br>
<br></blockquote><div><br></div><div>Maybe <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;white-space:nowrap;background-color:rgb(255,255,255)">kazutaka can explain this.</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;white-space:nowrap;background-color:rgb(255,255,255)"><br>
</span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> out:<br>
> return SD_RES_SUCCESS;<br>
<br>
<br>
This code looks really confusing, I think you need rework the logic<br>
first, take object recovery as an example, recovery_object just recovers<br>
the targeted object once at a time. Your delete_one I guess serves the<br>
same purpose as its name suggests, but you actually try to delete all<br>
the objects in one go.<br>
<br></blockquote><div><br></div><div>It's something different from recovery_object, in recovery_object it means recovering</div><div>a single object, but here delete_one means deleting a single VDI, not a single object,</div>
<div>as we see, maybe there're more than one VDI needs to be deleted in one deletion work,</div><div>so what delete_one does is to delete on of the VDIs.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
By the way, please run script/<a href="http://checkpatch.pl" target="_blank">checkpatch.pl</a>, there are some style problems.<br>
<br></blockquote><div>OK, I will check my latest patch with this script, maybe this version is dropped, as I just submitted</div><div>the 5th version yesterday.</div><div><br></div><div>thanks,</div><div><br></div><div>levin</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks,<br>
Yuan<br>
</blockquote><br><br>-- <br><font color="#999999">levin</font><div><a href="http://basiccoder.com" style="background-color:rgb(255,255,255)" target="_blank"><font color="#999999">basiccoder.com</font></a></div><br>