[sheepdog] [PATCH v1 2/3] sheepfs: make fetching data for cache become async operation
Robin Dong
robin.k.dong at gmail.com
Thu Mar 6 11:08:21 CET 2014
2014-03-06 17:23 GMT+08:00 Liu Yuan <namei.unix at gmail.com>:
> On Thu, Mar 06, 2014 at 03:47:20PM +0800, Robin Dong wrote:
> > From: Robin Dong <sanbai at taobao.com>
> >
> > Currently, we have add cache for http interface of sheepfs.But it will
> > only fetch data from http request when fuse operation has read out of
> > the cache.
> >
> > For better performance, we use 'double buffer' tech: fuse is reading
> > one buffer and a new created thread could read future data onto another
> > buffer at the same time. It will make reading operation more smoothly
> > and faster.
> >
> > We use two pointer: 'ready' and 'prepare' to point the double buffers
> > and use classic 'consumer and producer model' to avoid race condition.
> >
> > Signed-off-by: Robin Dong <sanbai at taobao.com>
> > ---
> > sheepfs/http.c | 165
> +++++++++++++++++++++++++++++++++++++++++++--------------
> > 1 file changed, 124 insertions(+), 41 deletions(-)
> >
> > diff --git a/sheepfs/http.c b/sheepfs/http.c
> > index 7df05ad..5610110 100644
> > --- a/sheepfs/http.c
> > +++ b/sheepfs/http.c
> > @@ -19,6 +19,7 @@
> > #include <stdio.h>
> > #include <time.h>
> > #include <curl/curl.h>
> > +#include <semaphore.h>
> >
> > #include "strbuf.h"
> > #include "sheepfs.h"
> > @@ -157,16 +158,17 @@ static size_t curl_read_object(const char *url,
> char *buf, size_t size,
> > }
> > if ((size_t)content_length > size) {
> > sheepfs_pr("Failed to get correct CONTENT_LENGTH, "
> > - "content_length: %"PRIu64", get_size:
> %"PRIu64,
> > - (size_t)content_length, size);
> > + "content_length: %"PRIu64", get_size: %"
> > + PRIu64, (size_t)content_length, size);
> > size = 0;
> > } else {
> > - sd_debug("Read out %"PRIu64" data from %s", size,
> url);
> > + sheepfs_pr("Read out %"PRIu64" data from %s", size,
> > + url);
> > size = (size_t)content_length;
> > }
> > } else {
> > sheepfs_pr("Failed to call libcurl res: %s, url: %s",
> > - curl_easy_strerror(res), url);
> > + curl_easy_strerror(res), url);
> > size = 0;
> > }
> > out:
> > @@ -234,19 +236,69 @@ out:
> > /* no rationale */
> > #define CACHE_SIZE (64 * 1024 * 1024)
> >
> > -struct cache_handle {
> > +struct cache_s {
> > char *mem;
> > off_t offset;
> > size_t size;
> > };
>
> What _s means? I'd sugguest struct read_cache
>
"_s" means "struct"
>
> >
> > +struct cache_handle {
> > + char path[PATH_MAX];
> > + struct cache_s *ready;
> > + struct cache_s *prepare;
> > + pthread_t fetch_thread;
> > + sem_t ready_sem;
> > + sem_t prepare_sem;
>
> why choose sem_t over pthread mutex, any reason?
>
Because pthread mutex is very hard to be used in our model. For example:
(lock and unlock many times)
....
pthread_mutex_unlock()
pthread_mutex_destroy()
and
(lock and unlock many times)
....
pthread_mutex_lock()
pthread_mutex_destroy()
the destroy will return EBUSY and cause panic in both case above.
In "consumer and producer model", the consumer (or producer) will end in
any condition,
which means pthread mutex could be locked or unlocked.
How could we destroy a locked or unlocked pthread-mutex ?
Using PTHREAD_MUTEX_ERRORCHECK is another pain on the neck: it dose not
allowed
one thread to lock same mutex twice.
pthread_cond_t may lose signal, so the best choice is the sandard semaphore.
> > + bool stop;
> > + uatomic_bool fetching;
> > + off_t fetch_offset;
> > + size_t obj_size;
>
> I think we should avoid use size_t, since its incompatible in 32 and 64
> bits
> use uint64_t explicitly.
>
> All "size" in the object_read() and object_write() is 'size_t', actually we
can't create a file larger than (size_t) in fuse.
> Thanks
> Yuan
>
--
--
Best Regard
Robin Dong
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20140306/ce44a619/attachment-0004.html>
More information about the sheepdog
mailing list