[sheepdog] [PATCH 0/4] introduce slice to farm

Liu Yuan namei.unix at gmail.com
Tue Jul 16 11:30:17 CEST 2013


Slice is a fixed chunk of one object to be stored in farm. We slice
the object into smaller chunks to get better deduplication.

For a test with 200M cluster with 2 copies (so roughly 100M data to backup),
I got the following resualt:

                   size  time    compress ratio
w/ slice (64K)  :  51M   2.037s       49%
w/ slice (128K) :  53M   1.223s       47%
w/ slice (256K) :  57M   1.216s       43%
w/ slice (512K) :  61M   1.205s       39%
w/o slice (4M)  :  97M   1.174s       3%

I choose 128K slice size.

I actually tried further more -- compress the slice before writing to disk.
But due to the images are virtually random files, I didn't get any compression
with zlib, but spent much more time to backup.

You can try the test zlib patch on top of this series. Please drop zlib patch
to merge the patch set.

Liu Yuan (4):
  sheep: don't trim before calculating sha1
  farm: clean up trunk.c
  farm: slice.c proper
  farm: use slice_{read, write} to read/write object

 collie/Makefile.am      |    2 +-
 collie/farm/farm.c      |    4 +-
 collie/farm/farm.h      |    4 +-
 collie/farm/sha1_file.c |   20 +++------
 collie/farm/slice.c     |  109 +++++++++++++++++++++++++++++++++++++++++++++++
 collie/farm/trunk.c     |    4 +-
 sheep/plain_store.c     |    5 ---
 7 files changed, 122 insertions(+), 26 deletions(-)
 create mode 100644 collie/farm/slice.c

-- 
1.7.9.5




More information about the sheepdog mailing list