[sheepdog] [RFC PATCH 0/3] Introduce a new journal mechanism

MORITA Kazutaka morita.kazutaka at gmail.com
Mon Nov 5 08:22:00 CET 2012


At Sat,  3 Nov 2012 23:09:44 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly at taobao.com>
> 
> This patch set is meant to replace current journal.c competely which only
> journal inode object update. The new journal mechanism journal both inode
> and data object updates and achieves much better IO perfmance (nearly up to
> 80 times faster write on my SAS disks machine test) if journal device is
> external to the backend stroage and almost the same perf if journal device
> is internal as the current master default no journal mode.
> 
> The journal recovery process (can be skipped if required) at startup would
> cost less than 10 seconds.
> 
> For a quick overview of 'journal file' I introduced, the idea is very
> simple: use a dedicated device to log all the IO operations in a
> sequential manner and then we are safe to change the backend IO
> operations from O_DSYNC & O_DIRECT into O_RDWR (buffered IO), which will
> benefit us both read & write performance a lot.

I'd like to have an option to enable O_DIRECT for backend I/Os even if
journaling is enabled.  We want to save memory to run as many VMs as
possible.

> ======================================================================
> Work remains:
>  Besides the code, I want to hear more comment on how the interface should be
>  programmed.
> 
>  I think of following interface:
>  -j internal,size=256 # which enables journaling and log updates on the same disk
>                         with the log device size 256M bytes
>  -j external=/path/to/directory # which enables journaling and log updates on
>                                   external device with the default size 512M
>  -j external=/path/to/directory,skip # like above but skip journal recovery at startup

I have a patch in my queue to refine sheep command option handling
like collie.  With the patch, sheep option syntax is as follows:

  -<ch> <name>[:<key>=<value>,<key>=<value>,....]

Key and value pairs are optional.  E.g.

  -p 7001
  -c zookeeper:servers=xx.xx.xx.xx,yy.yy.yy.yy
  -w object:size=100M,nopagecache=true

Can we use this syntax for journal options too?

  -j internal:size=256M
  -j external:file=/path/to/directory
  -j external:file=/path/to/directory,skip=true

> 
>  I am not sure if we should enable journaling as default because this journaling run
> almost the same perf even as internal journal device.

I think we should add this feature as an optional one at first.  I'd
like to see how it works on my and other's environment before enabling
it by default.

Thanks,

Kazutaka



More information about the sheepdog mailing list