[sheepdog] [RFC PATCH 0/3] Introduce a new journal mechanism

Liu Yuan namei.unix at gmail.com
Sat Nov 3 16:09:44 CET 2012


From: Liu Yuan <tailai.ly at taobao.com>

This patch set is meant to replace current journal.c competely which only
journal inode object update. The new journal mechanism journal both inode
and data object updates and achieves much better IO perfmance (nearly up to
80 times faster write on my SAS disks machine test) if journal device is
external to the backend stroage and almost the same perf if journal device
is internal as the current master default no journal mode.

The journal recovery process (can be skipped if required) at startup would
cost less than 10 seconds.

For a quick overview of 'journal file' I introduced, the idea is very
simple: use a dedicated device to log all the IO operations in a
sequential manner and then we are safe to change the backend IO
operations from O_DSYNC & O_DIRECT into O_RDWR (buffered IO), which will
benefit us both read & write performance a lot.

Usage:
 $sheep -j /path/to/journal_directory other_options

*NOTE* Because of internal design, you need to pass the path of a
directory instead of a file entry. This means we can't operate on raw
device file.

Some perf numbers:
-----------------------------------------------------
 ./script/vditest test -c none -w -B 4096 -h1 -a
on jd branch:
 Total write throughput: 33875968.0B/s (32.3M/s), IOPS 8270.5/s.

and on current master:
 Total write throughput: 410419.2B/s (400.8K/s), IOPS 100.2/s.

-----------------------------------------------------
For comparison, I also add a object cache (buffered IO) test number:
 ./script/vditest test -c writeback -w -B 4096 -h1 -a
 Total write throughput: 61089792.0B/s (58.3M/s), IOPS 14914.5/s.

With 'journal file' even on the disk of the same IO capability, we can
get 55% of buffered IO performance. I have also tested 'journal file'
mode with ramdisk setup as journal device, I got nearly 80% of buffered
IO perf. So I conclude with a faster journal device, we can get even
more. I only test write performance, but I think the read performance
will get increased too.

======================================================================
Work remains:
 Besides the code, I want to hear more comment on how the interface should be
 programmed.

 I think of following interface:
 -j internal,size=256 # which enables journaling and log updates on the same disk
                        with the log device size 256M bytes
 -j external=/path/to/directory # which enables journaling and log updates on
                                  external device with the default size 512M
 -j external=/path/to/directory,skip # like above but skip journal recovery at startup

 I am not sure if we should enable journaling as default because this journaling run
almost the same perf even as internal journal device.
 

 After the agreement is reached, I will go remove journal.c and finalize the patch set.

Liu Yuan (3):
  store/plain: move flag operation into get_open_flags()
  test: fix spurious failture of 001 002
  sheep: introduce journal file to boost IO performance

 include/util.h       |    5 +
 sheep/Makefile.am    |    2 +-
 sheep/journal_file.c |  386 ++++++++++++++++++++++++++++++++++++++++++++++++++
 sheep/plain_store.c  |   42 ++++--
 sheep/sheep.c        |   15 +-
 sheep/sheep_priv.h   |    8 +-
 sheep/store.c        |    9 +-
 tests/001            |    4 +-
 tests/002            |    1 +
 9 files changed, 450 insertions(+), 22 deletions(-)
 create mode 100644 sheep/journal_file.c

-- 
1.7.9.5




More information about the sheepdog mailing list