[sheepdog] [PATCH v1 2/3] sheepfs: make fetching data for cache become async operation

Thu Mar 6 14:06:41 CET 2014

On Thu, Mar 06, 2014 at 06:08:21PM +0800, Robin Dong wrote:
> 2014-03-06 17:23 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
> 
> > On Thu, Mar 06, 2014 at 03:47:20PM +0800, Robin Dong wrote:
> > > From: Robin Dong <sanbai at taobao.com>
> > >
> > > Currently, we have add cache for http interface of sheepfs.But it will
> > > only fetch data from http request when fuse operation has read out of
> > > the cache.
> > >
> > > For better performance, we use 'double buffer' tech: fuse is reading
> > > one buffer and a new created thread could read future data onto another
> > > buffer at the same time. It will make reading operation more smoothly
> > > and faster.
> > >
> > > We use two pointer: 'ready' and 'prepare' to point the double buffers
> > > and use classic 'consumer and producer model' to avoid race condition.
> > >
> > > Signed-off-by: Robin Dong <sanbai at taobao.com>
> > > ---
> > >  sheepfs/http.c | 165
> > +++++++++++++++++++++++++++++++++++++++++++--------------
> > >  1 file changed, 124 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/sheepfs/http.c b/sheepfs/http.c
> > > index 7df05ad..5610110 100644
> > > --- a/sheepfs/http.c
> > > +++ b/sheepfs/http.c
> > > @@ -19,6 +19,7 @@
> > >  #include <stdio.h>
> > >  #include <time.h>
> > >  #include <curl/curl.h>
> > > +#include <semaphore.h>
> > >
> > >  #include "strbuf.h"
> > >  #include "sheepfs.h"
> > > @@ -157,16 +158,17 @@ static size_t curl_read_object(const char *url,
> > char *buf, size_t size,
> > >               }
> > >               if ((size_t)content_length > size) {
> > >                       sheepfs_pr("Failed to get correct CONTENT_LENGTH, "
> > > -                            "content_length: %"PRIu64", get_size:
> > %"PRIu64,
> > > -                            (size_t)content_length, size);
> > > +                                "content_length: %"PRIu64", get_size: %"
> > > +                                PRIu64, (size_t)content_length, size);
> > >                       size = 0;
> > >               } else {
> > > -                     sd_debug("Read out %"PRIu64" data from %s", size,
> > url);
> > > +                     sheepfs_pr("Read out %"PRIu64" data from %s", size,
> > > +                                url);
> > >                       size = (size_t)content_length;
> > >               }
> > >       } else {
> > >               sheepfs_pr("Failed to call libcurl res: %s, url: %s",
> > > -                    curl_easy_strerror(res), url);
> > > +                        curl_easy_strerror(res), url);
> > >               size = 0;
> > >       }
> > >  out:
> > > @@ -234,19 +236,69 @@ out:
> > >  /* no rationale */
> > >  #define CACHE_SIZE   (64 * 1024 * 1024)
> > >
> > > -struct cache_handle {
> > > +struct cache_s {
> > >       char *mem;
> > >       off_t offset;
> > >       size_t size;
> > >  };
> >
> > What _s means? I'd sugguest struct read_cache
> >
> "_s" means "struct"
> 
> 
> >
> > >
> > > +struct cache_handle {
> > > +     char            path[PATH_MAX];
> > > +     struct cache_s  *ready;
> > > +     struct cache_s  *prepare;
> > > +     pthread_t       fetch_thread;
> > > +     sem_t           ready_sem;
> > > +     sem_t           prepare_sem;
> >
> > why choose sem_t over pthread mutex, any reason?
> >
> Because pthread mutex is very hard to be used in our model. For example:
> 
>     (lock and unlock many times)
>     ....
>     pthread_mutex_unlock()
>     pthread_mutex_destroy()
> 
> and
> 
>     (lock and unlock many times)
>     ....
>     pthread_mutex_lock()
>     pthread_mutex_destroy()
> 
> the destroy will return EBUSY and cause panic in both case above.
> In "consumer and producer model", the consumer (or producer) will end in
> any condition,
> which means pthread mutex could be locked or unlocked.
> How could we destroy a locked or unlocked pthread-mutex ?
> 
> Using PTHREAD_MUTEX_ERRORCHECK is another pain on the neck: it dose not
> allowed
> one thread to lock same mutex twice.
> 
> pthread_cond_t may lose signal, so the best choice is the sandard semaphore.

I think the above rationale should be included into source.

Thanks
Yuan