<br><br>On Saturday, May 5, 2012, Liu Yuan  wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 05/04/2012 02:49 PM, levin li wrote:<br>

<br>

> Cluster recovery may cause objects migrated from one node<br>

> to another, but the object list cache doesn't change, when<br>

> deleting an object we can not find the right node in whose<br>

> cache the id stays, so we need to notify the deletion object<br>

> list to all the node to make them delete the specified<br>

> object id from object list cache.<br>

><br>

> I reused the deletion_work_list and deletion_wqueue for the<br>

> object list cleanup work, so that we save some memory by not<br>

> starting a new work queue.<br>

><br>

> The del_vdi is called in worker thread, but it's locked by the<br>

> notification mechanism, other operations about deletion_work_list<br>

> and deletion_wqueue are done in main thread, so there's no need<br>

> to concern about the race condition.<br>

><br>

> Signed-off-by: levin li <<a>xingke.lwp@taobao.com</a>><br>

> ---<br>

>  include/sheep.h           |    1 +<br>

>  sheep/group.c             |    1 +<br>

>  sheep/object_list_cache.c |   17 +++++++++<br>

>  sheep/ops.c               |   57 ++++++++++++++++++++++++++++++<br>

>  sheep/sheep_priv.h        |   21 ++++++++++++<br>

>  sheep/vdi.c               |   84 ++++++++++++++++++++++++++++++---------------<br>

>  6 files changed, 154 insertions(+), 27 deletions(-)<br>

><br>

> diff --git a/include/sheep.h b/include/sheep.h<br>

> index 7e287c4..e941dc1 100644<br>

> --- a/include/sheep.h<br>

> +++ b/include/sheep.h<br>

> @@ -46,6 +46,7 @@<br>

>  #define SD_OP_TRACE          0x95<br>

>  #define SD_OP_TRACE_CAT      0x96<br>

>  #define SD_OP_STAT_RECOVERY  0x97<br>

> +#define SD_OP_NOTIFY_VDI_DEL 0x98<br>

><br>

>  #define SD_FLAG_CMD_IO_LOCAL   0x0010<br>

>  #define SD_FLAG_CMD_RECOVERY 0x0020<br>

> diff --git a/sheep/group.c b/sheep/group.c<br>

> index c7fd387..1f1aa8d 100644<br>

> --- a/sheep/group.c<br>

> +++ b/sheep/group.c<br>

> @@ -1314,6 +1314,7 @@ int create_cluster(int port, int64_t zone, int nr_vnodes)<br>

>       INIT_LIST_HEAD(&sys->req_wait_for_obj_list);<br>

>       INIT_LIST_HEAD(&sys->consistent_obj_list);<br>

>       INIT_LIST_HEAD(&sys->blocking_conn_list);<br>

> +     INIT_LIST_HEAD(&sys->deletion_work_list);<br>

><br>

>       INIT_LIST_HEAD(&sys->request_queue);<br>

>       INIT_LIST_HEAD(&sys->event_queue);<br>

> diff --git a/sheep/object_list_cache.c b/sheep/object_list_cache.c<br>

> index 28cdbbc..0aa93e5 100644<br>

> --- a/sheep/object_list_cache.c<br>

> +++ b/sheep/object_list_cache.c<br>

> @@ -149,3 +149,20 @@ int get_obj_list(const struct sd_list_req *hdr, struct sd_list_rsp *rsp, void *d<br>

><br>

>       return res;<br>

>  }<br>

> +<br>

> +int del_vdi_from_objlist_cache(uint64_t *objs, int count)<br>

> +{<br>

> +     int i;<br>

> +<br>

> +     dprintf("%d\n", count);<br>

> +     for (i = 0; i < count; i++) {<br>

> +             pthread_rwlock_wrlock(&obj_list_cache.lock);<br>

> +             if (!objlist_cache_rb_remove(&obj_list_cache.root, objs[i])) {<br>

> +                     dprintf("remove oid %" PRIx64 " from objlist cache\n", objs[i]);<br>

> +                     obj_list_cache.cache_size--;<br>

> +             }<br>

> +             pthread_rwlock_unlock(&obj_list_cache.lock);<br>

> +     }<br>

> +<br>

> +     return 0;<br>

> +}<br>

> diff --git a/sheep/ops.c b/sheep/ops.c<br>

> index b6f8eb2..5763b6f 100644<br>

> --- a/sheep/ops.c<br>

> +++ b/sheep/ops.c<br>

> @@ -550,6 +550,57 @@ static int cluster_cleanup(const struct sd_req *req, struct sd_rsp *rsp,<br>

>       return ret;<br>

>  }<br>

><br>

> +static void objlist_cache_del_vdi_work(struct work *work)<br>

> +{<br>

> +     struct deletion_work *dw = container_of(work, struct deletion_work, work);<br>

> +<br>

> +     del_vdi_from_obThis name looks confusing, I guess you mean undeleted objects as in<br>

dw->deleted_objs, dw->deleted_count.<br>

<br></blockquote><div>No, here I do mean deleted objects indeed, by this buffer I record which objects are deleted,</div><div>and then we can notify this list to the other nodes.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


> +<br>

>       if (dw->delete_error) {<br>

>               write_object(dw->vnodes, dw->epoch, vid_to_vdi_oid(vdi_id),<br>

>                            (void *)inode, sizeof(*inode), 0, 0, nr_copies, 0);<br>

> @@ -505,8 +535,8 @@ static void delete_one_done(struct work *work)<br>

>       free(dw->buf);<br>

>       free(dw);<br>

><br>

> -     if (!list_empty(&deletion_work_list)) {<br>

> -             dw = list_first_entry(&deletion_work_list,<br>

> +     if (!list_empty(&sys->deletion_work_list)) {<br>

> +             dw = list_first_entry(&sys->deletion_work_list,<br>

>                                     struct deletion_work, dw_siblings);<br>

><br>

>               queue_work(sys->deletion_wqueue, &dw->work);<br>

> @@ -647,12 +677,12 @@ int start_deletion(uint32_t vid, uint32_t epoch)<br>

>       if (dw->count == 0)<br>

>               goto out;<br>

><br>

> -     if (!list_empty(&deletion_work_list)) {<br>

> -             list_add_tail(&dw->dw_siblings, &deletion_work_list);<br>

> +     if (!list_empty(&sys->deletion_work_list)) {<br>

> +             list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);<br>

>               goto out;<br>

>       }<br>

><br>

> -     list_add_tail(&dw->dw_siblings, &deletion_work_list);<br>

> +     list_add_tail(&dw->dw_siblings, &sys->deletion_work_list);<br>

>       queue_work(sys->deletion_wqueue, &dw->work);<br>

<br>

<br>

it looks redundant to have deletion_work_list here to get FIFO semantics.<br>

<br>

We only have one worker for deletion_wqueue, isn't it FIFO already?<br>

<br></blockquote><div><br></div><div>Maybe <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;white-space:nowrap;background-color:rgb(255,255,255)">kazutaka can explain this.</span></div><div><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;white-space:nowrap;background-color:rgb(255,255,255)"><br>

</span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

>  out:<br>

>       return SD_RES_SUCCESS;<br>

<br>

<br>

This code looks really confusing, I think you need rework the logic<br>

first, take object recovery as an example, recovery_object just recovers<br>

the targeted object once at a time. Your delete_one I guess serves the<br>

same purpose as its name suggests, but you actually try to delete all<br>

the objects in one go.<br>

<br></blockquote><div><br></div><div>It's something different from recovery_object, in recovery_object it means recovering</div><div>a single object, but here delete_one means deleting a single VDI, not a single object,</div>

<div>as we see, maybe there're more than one VDI needs to be deleted in one deletion work,</div><div>so what delete_one does is to delete on of the VDIs.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


By the way, please run script/<a href="http://checkpatch.pl" target="_blank">checkpatch.pl</a>, there are some style problems.<br>

<br></blockquote><div>OK, I will check my latest patch with this script, maybe this version is dropped, as I just submitted</div><div>the 5th version yesterday.</div><div><br></div><div>thanks,</div><div><br></div><div>levin</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Thanks,<br>

Yuan<br>

</blockquote><br><br>-- <br><font color="#999999">levin</font><div><a href="http://basiccoder.com" style="background-color:rgb(255,255,255)" target="_blank"><font color="#999999">basiccoder.com</font></a></div><br>