<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">2014-03-06 17:23 GMT+08:00 Liu Yuan <span dir="ltr"><<a href="mailto:namei.unix@gmail.com" target="_blank">namei.unix@gmail.com</a>></span>:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Thu, Mar 06, 2014 at 03:47:20PM +0800, Robin Dong wrote:<br>


> From: Robin Dong <<a href="mailto:sanbai@taobao.com">sanbai@taobao.com</a>><br>

><br>

> Currently, we have add cache for http interface of sheepfs.But it will<br>

> only fetch data from http request when fuse operation has read out of<br>

> the cache.<br>

><br>

> For better performance, we use 'double buffer' tech: fuse is reading<br>

> one buffer and a new created thread could read future data onto another<br>

> buffer at the same time. It will make reading operation more smoothly<br>

> and faster.<br>

><br>

> We use two pointer: 'ready' and 'prepare' to point the double buffers<br>

> and use classic 'consumer and producer model' to avoid race condition.<br>

><br>

> Signed-off-by: Robin Dong <<a href="mailto:sanbai@taobao.com">sanbai@taobao.com</a>><br>

> ---<br>

>  sheepfs/http.c | 165 +++++++++++++++++++++++++++++++++++++++++++--------------<br>

>  1 file changed, 124 insertions(+), 41 deletions(-)<br>

><br>

> diff --git a/sheepfs/http.c b/sheepfs/http.c<br>

> index 7df05ad..5610110 100644<br>

> --- a/sheepfs/http.c<br>

> +++ b/sheepfs/http.c<br>

> @@ -19,6 +19,7 @@<br>

>  #include <stdio.h><br>

>  #include <time.h><br>

>  #include <curl/curl.h><br>

> +#include <semaphore.h><br>

><br>

>  #include "strbuf.h"<br>

>  #include "sheepfs.h"<br>

> @@ -157,16 +158,17 @@ static size_t curl_read_object(const char *url, char *buf, size_t size,<br>

>               }<br>

>               if ((size_t)content_length > size) {<br>

>                       sheepfs_pr("Failed to get correct CONTENT_LENGTH, "<br>

> -                            "content_length: %"PRIu64", get_size: %"PRIu64,<br>

> -                            (size_t)content_length, size);<br>

> +                                "content_length: %"PRIu64", get_size: %"<br>

> +                                PRIu64, (size_t)content_length, size);<br>

>                       size = 0;<br>

>               } else {<br>

> -                     sd_debug("Read out %"PRIu64" data from %s", size, url);<br>

> +                     sheepfs_pr("Read out %"PRIu64" data from %s", size,<br>

> +                                url);<br>

>                       size = (size_t)content_length;<br>

>               }<br>

>       } else {<br>

>               sheepfs_pr("Failed to call libcurl res: %s, url: %s",<br>

> -                    curl_easy_strerror(res), url);<br>

> +                        curl_easy_strerror(res), url);<br>

>               size = 0;<br>

>       }<br>

>  out:<br>

> @@ -234,19 +236,69 @@ out:<br>

>  /* no rationale */<br>

>  #define CACHE_SIZE   (64 * 1024 * 1024)<br>

><br>

> -struct cache_handle {<br>

> +struct cache_s {<br>

>       char *mem;<br>

>       off_t offset;<br>

>       size_t size;<br>

>  };<br>

<br>

</div></div>What _s means? I'd sugguest struct read_cache<br></blockquote><div>"_s" means "struct"</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div class=""><br>

><br>

> +struct cache_handle {<br>

> +     char            path[PATH_MAX];<br>

> +     struct cache_s  *ready;<br>

> +     struct cache_s  *prepare;<br>

> +     pthread_t       fetch_thread;<br>

> +     sem_t           ready_sem;<br>

> +     sem_t           prepare_sem;<br>

<br>

</div>why choose sem_t over pthread mutex, any reason?<br></blockquote><div>Because pthread mutex is very hard to be used in our model. For example:</div><div><br></div><div>    (lock and unlock many times) </div><div>    ....</div>

<div>    pthread_mutex_unlock()</div><div>    pthread_mutex_destroy()</div><div><br></div><div>and</div><div><br></div><div><div>    (lock and unlock many times) </div><div>    ....</div><div>    pthread_mutex_lock()</div>

<div>    pthread_mutex_destroy()</div></div><div><br></div><div>the destroy will return EBUSY and cause panic in both case above. </div><div>In "consumer and producer model", the consumer (or producer) will end in any condition, </div>

<div>which means pthread mutex could be locked or unlocked. </div><div>How could we destroy a locked or unlocked pthread-mutex ?</div><div><br></div><div>Using PTHREAD_MUTEX_ERRORCHECK is another pain on the neck: it dose not allowed</div>

<div>one thread to lock same mutex twice.</div><div><br></div><div>pthread_cond_t may lose signal, so the best choice is the sandard semaphore.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div class=""><br>

> +     bool            stop;<br>

> +     uatomic_bool    fetching;<br>

> +     off_t           fetch_offset;<br>

> +     size_t          obj_size;<br>

<br>

</div>I think we should avoid use size_t, since its incompatible in 32 and 64 bits<br>

use uint64_t explicitly.<br>

<br></blockquote><div>All "size" in the object_read() and object_write() is 'size_t', actually we</div><div>can't create a file larger than (size_t) in fuse.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Thanks<br>

<span class=""><font color="#888888">Yuan<br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>--<br>Best Regard<br>Robin Dong

</div></div>