[sheepdog] [PATCH v0, RFC] sheep: writeback cache semantics in backend store
Liu Yuan
namei.unix at gmail.com
Fri Aug 17 05:01:41 CEST 2012
On 08/16/2012 07:45 PM, Hitoshi Mitake wrote:
> From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
>
> Hi sheepdog list, nice to meet you.
>
> This patch implements writeback cache semantics in backend
> store of sheep. Current backend store farm calls open() with
> O_DSYNC, so every object write causes slow disk access. This incurs
> overhead and this overhead is not necessary. Because current qemu
> block driver invokes SD_OP_FLUSH_VDI explicitly for object
> cache. Flushing disk cache with the invocation of SD_OP_FLUSH_VDI
> instead of every object write is enough for current sheep.
>
> For improving performance by reducing needless disk access, this patch
> adds new inter-sheep operation SD_OP_FLUSH_PEER. This operation is
> used in a situation like this:
> qemu sends SD_OP_FLUSH_VDI -> gateway sheep sends SD_OP_FLUSH_PEER ->
> other sheeps
> And sheeps which received SD_OP_FLUSH_PEER flush disk cache with
> syncfs() system call.
>
>
> Below is the evaluation result with dbench:
>
> Before applying this patch, without -s (O_SYNC) option:
> Throughput 13.9269 MB/sec 1 clients 1 procs max_latency=818.428 ms
> Before applying this patch, with -s option:
> Throughput 2.76792 MB/sec (sync open) 1 clients 1 procs
> max_latency=291.670 ms
>
> After applying this patch, without -s option:
> Throughput 29.7306 MB/sec 1 clients 1 procs max_latency=1344.463 ms
> After applying this patch, with -s option:
> Throughput 4.29357 MB/sec (sync open) 1 clients 1 procs
> max_latency=450.045 ms
>
>
> This patch adds new command line option -W to sheep. With -W, sheep
> uses writeback cache semantics in backend store. I added this new
> option mainly for easy testing and evaluation. If writeback cache
> semantics is more suitable than the previous writethrough semantics as
> default, I'll delete this option again.
>
> This patch may contain lots of bad code because I'm new to sheepdog.
> I'd like to hear your comments.
>
> Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>
> Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp>
>
> ---
> include/internal_proto.h | 1 +
> sheep/farm/farm.c | 3 +++
> sheep/gateway.c | 2 +-
> sheep/ops.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> sheep/sheep.c | 7 ++++++-
> sheep/sheep_priv.h | 5 +++++
> 6 files changed, 60 insertions(+), 2 deletions(-)
>
> diff --git a/include/internal_proto.h b/include/internal_proto.h
> index 83d98f1..f34ef92 100644
> --- a/include/internal_proto.h
> +++ b/include/internal_proto.h
> @@ -63,6 +63,7 @@
> #define SD_OP_ENABLE_RECOVER 0xA8
> #define SD_OP_DISABLE_RECOVER 0xA9
> #define SD_OP_INFO_RECOVER 0xAA
> +#define SD_OP_FLUSH_PEER 0xAB
>
> /* internal flags for hdr.flags, must be above 0x80 */
> #define SD_FLAG_CMD_RECOVERY 0x0080
> diff --git a/sheep/farm/farm.c b/sheep/farm/farm.c
> index 7eeae9a..991c009 100644
> --- a/sheep/farm/farm.c
> +++ b/sheep/farm/farm.c
> @@ -362,6 +362,9 @@ static int farm_init(char *p)
> iocb.epoch = sys->epoch ? sys->epoch - 1 : 0;
> farm_cleanup_sys_obj(&iocb);
>
> + if (sys->store_writeback)
> + def_open_flags &= ~O_DSYNC;
> +
> return SD_RES_SUCCESS;
> err:
> return SD_RES_EIO;
> diff --git a/sheep/gateway.c b/sheep/gateway.c
> index bdbd08c..79fdd07 100644
> --- a/sheep/gateway.c
> +++ b/sheep/gateway.c
> @@ -225,7 +225,7 @@ static inline void gateway_init_fwd_hdr(struct sd_req *fwd, struct sd_req *hdr)
> fwd->proto_ver = SD_SHEEP_PROTO_VER;
> }
>
> -static int gateway_forward_request(struct request *req)
> +int gateway_forward_request(struct request *req)
> {
> int i, err_ret = SD_RES_SUCCESS, ret, local = -1;
> unsigned wlen;
> diff --git a/sheep/ops.c b/sheep/ops.c
> index 8ca8748..0ec4b63 100644
> --- a/sheep/ops.c
> +++ b/sheep/ops.c
> @@ -22,6 +22,11 @@
> #include <sys/stat.h>
> #include <pthread.h>
>
> +#include <asm/unistd.h> /* for __NR_syncfs */
> +#ifndef __NR_syncfs
> +#define __NR_syncfs 306
> +#endif
> +
> #include "sheep_priv.h"
> #include "strbuf.h"
> #include "trace/trace.h"
> @@ -584,6 +589,9 @@ static int local_get_snap_file(struct request *req)
>
> static int local_flush_vdi(struct request *req)
> {
> + if (sys->store_writeback)
> + gateway_forward_request(req);
> +
> if (!sys->enable_write_cache)
> return SD_RES_SUCCESS;
> return object_cache_flush_vdi(req);
> @@ -837,6 +845,35 @@ out:
> return ret;
> }
>
> +static int syncfs(int fd)
> +{
> + return syscall(__NR_syncfs, fd);
> +}
> +
syncfs only appeared in Linux 2.6.39, rather new, I think most of
systems won't support it.
> +int peer_flush_dcache(struct request *req)
> +{
> + int fd;
> + struct sd_req *hdr = &req->rq;
> + uint64_t oid = hdr->obj.oid;
> + char path[PATH_MAX];
> +
> + sprintf(path, "%s%016"PRIx64, obj_path, oid);
> + fd = open(path, O_RDONLY);
> + if (fd < 0) {
> + eprintf("error at open() %s, %s\n", path, strerror(errno));
> + return SD_RES_NO_OBJ;
> + }
> +
> + if (syncfs(fd)) {
> + eprintf("error at syncfs(), %s\n", strerror(errno));
> + return SD_RES_EIO;
> + }
> +
> + close(fd);
> +
> + return SD_RES_SUCCESS;
> +}
> +
With current implementation, you only flush inode object, not all the
data objects belong to targeted VDI. I think this is hardest part to
implement. Probably you need to call exec_local_request() to take
advantage of retry mechanism.
Thanks,
Yuan
More information about the sheepdog
mailing list