[sheepdog] [PATCH v1 2/3] sheepfs: make fetching data for cache become async operation
Liu Yuan
namei.unix at gmail.com
Thu Mar 6 14:06:41 CET 2014
On Thu, Mar 06, 2014 at 06:08:21PM +0800, Robin Dong wrote:
> 2014-03-06 17:23 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
>
> > On Thu, Mar 06, 2014 at 03:47:20PM +0800, Robin Dong wrote:
> > > From: Robin Dong <sanbai at taobao.com>
> > >
> > > Currently, we have add cache for http interface of sheepfs.But it will
> > > only fetch data from http request when fuse operation has read out of
> > > the cache.
> > >
> > > For better performance, we use 'double buffer' tech: fuse is reading
> > > one buffer and a new created thread could read future data onto another
> > > buffer at the same time. It will make reading operation more smoothly
> > > and faster.
> > >
> > > We use two pointer: 'ready' and 'prepare' to point the double buffers
> > > and use classic 'consumer and producer model' to avoid race condition.
> > >
> > > Signed-off-by: Robin Dong <sanbai at taobao.com>
> > > ---
> > > sheepfs/http.c | 165
> > +++++++++++++++++++++++++++++++++++++++++++--------------
> > > 1 file changed, 124 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/sheepfs/http.c b/sheepfs/http.c
> > > index 7df05ad..5610110 100644
> > > --- a/sheepfs/http.c
> > > +++ b/sheepfs/http.c
> > > @@ -19,6 +19,7 @@
> > > #include <stdio.h>
> > > #include <time.h>
> > > #include <curl/curl.h>
> > > +#include <semaphore.h>
> > >
> > > #include "strbuf.h"
> > > #include "sheepfs.h"
> > > @@ -157,16 +158,17 @@ static size_t curl_read_object(const char *url,
> > char *buf, size_t size,
> > > }
> > > if ((size_t)content_length > size) {
> > > sheepfs_pr("Failed to get correct CONTENT_LENGTH, "
> > > - "content_length: %"PRIu64", get_size:
> > %"PRIu64,
> > > - (size_t)content_length, size);
> > > + "content_length: %"PRIu64", get_size: %"
> > > + PRIu64, (size_t)content_length, size);
> > > size = 0;
> > > } else {
> > > - sd_debug("Read out %"PRIu64" data from %s", size,
> > url);
> > > + sheepfs_pr("Read out %"PRIu64" data from %s", size,
> > > + url);
> > > size = (size_t)content_length;
> > > }
> > > } else {
> > > sheepfs_pr("Failed to call libcurl res: %s, url: %s",
> > > - curl_easy_strerror(res), url);
> > > + curl_easy_strerror(res), url);
> > > size = 0;
> > > }
> > > out:
> > > @@ -234,19 +236,69 @@ out:
> > > /* no rationale */
> > > #define CACHE_SIZE (64 * 1024 * 1024)
> > >
> > > -struct cache_handle {
> > > +struct cache_s {
> > > char *mem;
> > > off_t offset;
> > > size_t size;
> > > };
> >
> > What _s means? I'd sugguest struct read_cache
> >
> "_s" means "struct"
>
>
> >
> > >
> > > +struct cache_handle {
> > > + char path[PATH_MAX];
> > > + struct cache_s *ready;
> > > + struct cache_s *prepare;
> > > + pthread_t fetch_thread;
> > > + sem_t ready_sem;
> > > + sem_t prepare_sem;
> >
> > why choose sem_t over pthread mutex, any reason?
> >
> Because pthread mutex is very hard to be used in our model. For example:
>
> (lock and unlock many times)
> ....
> pthread_mutex_unlock()
> pthread_mutex_destroy()
>
> and
>
> (lock and unlock many times)
> ....
> pthread_mutex_lock()
> pthread_mutex_destroy()
>
> the destroy will return EBUSY and cause panic in both case above.
> In "consumer and producer model", the consumer (or producer) will end in
> any condition,
> which means pthread mutex could be locked or unlocked.
> How could we destroy a locked or unlocked pthread-mutex ?
>
> Using PTHREAD_MUTEX_ERRORCHECK is another pain on the neck: it dose not
> allowed
> one thread to lock same mutex twice.
>
> pthread_cond_t may lose signal, so the best choice is the sandard semaphore.
I think the above rationale should be included into source.
Thanks
Yuan
More information about the sheepdog
mailing list