From: Liu Yuan <tailai.ly at taobao.com> This patch set is meant to replace current journal.c competely which only journal inode object update. The new journal mechanism journal both inode and data object updates and achieves much better IO perfmance (nearly up to 80 times faster write on my SAS disks machine test) if journal device is external to the backend stroage and almost the same perf if journal device is internal as the current master default no journal mode. The journal recovery process (can be skipped if required) at startup would cost less than 10 seconds. For a quick overview of 'journal file' I introduced, the idea is very simple: use a dedicated device to log all the IO operations in a sequential manner and then we are safe to change the backend IO operations from O_DSYNC & O_DIRECT into O_RDWR (buffered IO), which will benefit us both read & write performance a lot. Usage: $sheep -j /path/to/journal_directory other_options *NOTE* Because of internal design, you need to pass the path of a directory instead of a file entry. This means we can't operate on raw device file. Some perf numbers: ----------------------------------------------------- ./script/vditest test -c none -w -B 4096 -h1 -a on jd branch: Total write throughput: 33875968.0B/s (32.3M/s), IOPS 8270.5/s. and on current master: Total write throughput: 410419.2B/s (400.8K/s), IOPS 100.2/s. ----------------------------------------------------- For comparison, I also add a object cache (buffered IO) test number: ./script/vditest test -c writeback -w -B 4096 -h1 -a Total write throughput: 61089792.0B/s (58.3M/s), IOPS 14914.5/s. With 'journal file' even on the disk of the same IO capability, we can get 55% of buffered IO performance. I have also tested 'journal file' mode with ramdisk setup as journal device, I got nearly 80% of buffered IO perf. So I conclude with a faster journal device, we can get even more. I only test write performance, but I think the read performance will get increased too. ====================================================================== Work remains: Besides the code, I want to hear more comment on how the interface should be programmed. I think of following interface: -j internal,size=256 # which enables journaling and log updates on the same disk with the log device size 256M bytes -j external=/path/to/directory # which enables journaling and log updates on external device with the default size 512M -j external=/path/to/directory,skip # like above but skip journal recovery at startup I am not sure if we should enable journaling as default because this journaling run almost the same perf even as internal journal device. After the agreement is reached, I will go remove journal.c and finalize the patch set. Liu Yuan (3): store/plain: move flag operation into get_open_flags() test: fix spurious failture of 001 002 sheep: introduce journal file to boost IO performance include/util.h | 5 + sheep/Makefile.am | 2 +- sheep/journal_file.c | 386 ++++++++++++++++++++++++++++++++++++++++++++++++++ sheep/plain_store.c | 42 ++++-- sheep/sheep.c | 15 +- sheep/sheep_priv.h | 8 +- sheep/store.c | 9 +- tests/001 | 4 +- tests/002 | 1 + 9 files changed, 450 insertions(+), 22 deletions(-) create mode 100644 sheep/journal_file.c -- 1.7.9.5 |