On 08/16/2012 07:45 PM, Hitoshi Mitake wrote: > From: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp> > > Hi sheepdog list, nice to meet you. > > This patch implements writeback cache semantics in backend > store of sheep. Current backend store farm calls open() with > O_DSYNC, so every object write causes slow disk access. This incurs > overhead and this overhead is not necessary. Because current qemu > block driver invokes SD_OP_FLUSH_VDI explicitly for object > cache. Flushing disk cache with the invocation of SD_OP_FLUSH_VDI > instead of every object write is enough for current sheep. > > For improving performance by reducing needless disk access, this patch > adds new inter-sheep operation SD_OP_FLUSH_PEER. This operation is > used in a situation like this: > qemu sends SD_OP_FLUSH_VDI -> gateway sheep sends SD_OP_FLUSH_PEER -> > other sheeps > And sheeps which received SD_OP_FLUSH_PEER flush disk cache with > syncfs() system call. > > > Below is the evaluation result with dbench: > > Before applying this patch, without -s (O_SYNC) option: > Throughput 13.9269 MB/sec 1 clients 1 procs max_latency=818.428 ms > Before applying this patch, with -s option: > Throughput 2.76792 MB/sec (sync open) 1 clients 1 procs > max_latency=291.670 ms > > After applying this patch, without -s option: > Throughput 29.7306 MB/sec 1 clients 1 procs max_latency=1344.463 ms > After applying this patch, with -s option: > Throughput 4.29357 MB/sec (sync open) 1 clients 1 procs > max_latency=450.045 ms > > > This patch adds new command line option -W to sheep. With -W, sheep > uses writeback cache semantics in backend store. I added this new > option mainly for easy testing and evaluation. If writeback cache > semantics is more suitable than the previous writethrough semantics as > default, I'll delete this option again. > > This patch may contain lots of bad code because I'm new to sheepdog. > I'd like to hear your comments. > > Cc: MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> > Signed-off-by: Hitoshi Mitake <mitake.hitoshi at lab.ntt.co.jp> > > --- > include/internal_proto.h | 1 + > sheep/farm/farm.c | 3 +++ > sheep/gateway.c | 2 +- > sheep/ops.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > sheep/sheep.c | 7 ++++++- > sheep/sheep_priv.h | 5 +++++ > 6 files changed, 60 insertions(+), 2 deletions(-) > > diff --git a/include/internal_proto.h b/include/internal_proto.h > index 83d98f1..f34ef92 100644 > --- a/include/internal_proto.h > +++ b/include/internal_proto.h > @@ -63,6 +63,7 @@ > #define SD_OP_ENABLE_RECOVER 0xA8 > #define SD_OP_DISABLE_RECOVER 0xA9 > #define SD_OP_INFO_RECOVER 0xAA > +#define SD_OP_FLUSH_PEER 0xAB > > /* internal flags for hdr.flags, must be above 0x80 */ > #define SD_FLAG_CMD_RECOVERY 0x0080 > diff --git a/sheep/farm/farm.c b/sheep/farm/farm.c > index 7eeae9a..991c009 100644 > --- a/sheep/farm/farm.c > +++ b/sheep/farm/farm.c > @@ -362,6 +362,9 @@ static int farm_init(char *p) > iocb.epoch = sys->epoch ? sys->epoch - 1 : 0; > farm_cleanup_sys_obj(&iocb); > > + if (sys->store_writeback) > + def_open_flags &= ~O_DSYNC; > + > return SD_RES_SUCCESS; > err: > return SD_RES_EIO; > diff --git a/sheep/gateway.c b/sheep/gateway.c > index bdbd08c..79fdd07 100644 > --- a/sheep/gateway.c > +++ b/sheep/gateway.c > @@ -225,7 +225,7 @@ static inline void gateway_init_fwd_hdr(struct sd_req *fwd, struct sd_req *hdr) > fwd->proto_ver = SD_SHEEP_PROTO_VER; > } > > -static int gateway_forward_request(struct request *req) > +int gateway_forward_request(struct request *req) > { > int i, err_ret = SD_RES_SUCCESS, ret, local = -1; > unsigned wlen; > diff --git a/sheep/ops.c b/sheep/ops.c > index 8ca8748..0ec4b63 100644 > --- a/sheep/ops.c > +++ b/sheep/ops.c > @@ -22,6 +22,11 @@ > #include <sys/stat.h> > #include <pthread.h> > > +#include <asm/unistd.h> /* for __NR_syncfs */ > +#ifndef __NR_syncfs > +#define __NR_syncfs 306 > +#endif > + > #include "sheep_priv.h" > #include "strbuf.h" > #include "trace/trace.h" > @@ -584,6 +589,9 @@ static int local_get_snap_file(struct request *req) > > static int local_flush_vdi(struct request *req) > { > + if (sys->store_writeback) > + gateway_forward_request(req); > + > if (!sys->enable_write_cache) > return SD_RES_SUCCESS; > return object_cache_flush_vdi(req); > @@ -837,6 +845,35 @@ out: > return ret; > } > > +static int syncfs(int fd) > +{ > + return syscall(__NR_syncfs, fd); > +} > + syncfs only appeared in Linux 2.6.39, rather new, I think most of systems won't support it. > +int peer_flush_dcache(struct request *req) > +{ > + int fd; > + struct sd_req *hdr = &req->rq; > + uint64_t oid = hdr->obj.oid; > + char path[PATH_MAX]; > + > + sprintf(path, "%s%016"PRIx64, obj_path, oid); > + fd = open(path, O_RDONLY); > + if (fd < 0) { > + eprintf("error at open() %s, %s\n", path, strerror(errno)); > + return SD_RES_NO_OBJ; > + } > + > + if (syncfs(fd)) { > + eprintf("error at syncfs(), %s\n", strerror(errno)); > + return SD_RES_EIO; > + } > + > + close(fd); > + > + return SD_RES_SUCCESS; > +} > + With current implementation, you only flush inode object, not all the data objects belong to targeted VDI. I think this is hardest part to implement. Probably you need to call exec_local_request() to take advantage of retry mechanism. Thanks, Yuan |