[sheepdog] [PATCH v7 00/10] replace structure of inode->data_vdi_id[] from array to b-tree

MORITA Kazutaka morita.kazutaka at gmail.com
Wed Nov 20 01:29:48 CET 2013


At Sat, 16 Nov 2013 22:57:29 +0800,
Robin Dong wrote:
> 
> Hi all,
> 
>    The size of vdi can only reach 4TB beacause the inode->data_vdi_id[] can only
>    support 1 million objects. But 4TB is too small for storage application
>    such as NAS and cloud-disk therefore we need to change the array of 'data_vdi_id' to
>    b-tree.
> 
>    This patchset add B-tree structure into sd_inode. It support just two levels
>    (one root-node and many leaf-nodes) and after this the size of vdi could reach about
>    (4MB / sizeof(sd_extent_header) * (4MB / sizeof(sd_extent)) * 4MB which is about 680PB
>     in theory.
> 
>    Currently the vdi size can raise to 16PB because the size of oid is just 32-bits, but
>    it is certainly enough for many storage requirement.
> 
>    v6 --> v7 changes:
>    	1. add support for erasure-code
> 	2. change help information for hyper volume
> 	3. modify idx_to_oid() to support object-cache correctly
> 	4. add comment for test-case of hyper volume
> 
>    v5 --> v6 changes:
>    	1. add delete_one() support for hyper volume
> 	2. add fill_object_tree() support for hyper volume
> 
>    v4 --> v5 changes:
> 	1. object_cache works for hyper volume now
> 	2. put 'btree_counter' after 'data_vdi_id[]' in sd_inode
> 	3. add vdi_clone support for hyper volume
> 
>    v3 --> v4 changes:
> 	1. let hyper volume work with object cache
> 	2. let hyper volume work with vdi_list vdi_check
> 	3. let all test case pass
> 	4. add new test case for hyper volume
> 
>    v2 --> v3 changes:
> 	1. move "btree_counter" after inode->child_vdi_id[]
> 	2. add new interface to write inode meta data
> 	3. change the names of some MACRO
> 
>    v1 --> v2 changes:
> 	1. fix the problem of create 16PB vdi and mkfs.xfs
> 	2. add comment and illustration to explain how B-tree works
> 
> Robin Dong (10):
>   sheep: change accessing of inode->data_vdi_id[] to function
>   sheep: replace structure of inode->data_vdi_id[] from array to btree
>   sheep: modify interface to write inode meta data
>   sheep: extend MAX number of objects
>   sheep: upgrade 'idx' in object_cache from 32bit to 64bit
>   sheep: add traverse_btree() to list and check vdi
>   sheep: use callback to implement fill_object_tree() and delete_one() for hyper volume
>   sheep: copy middle-node when vdi_clone() for hyper volume
>   sheep: support erasure code
>   sheep: add functional test case for hyper volume
> 
>  dog/cluster.c            |   36 ++-
>  dog/common.c             |   15 +-
>  dog/dog.h                |   13 +-
>  dog/farm/farm.c          |   12 +-
>  dog/vdi.c                |  257 ++++++++++++-----
>  include/sheepdog_proto.h |   97 ++++++-
>  lib/Makefile.am          |    2 +-
>  lib/option.c             |    7 +-
>  lib/sd_inode.c           |  696 ++++++++++++++++++++++++++++++++++++++++++++++
>  sheep/gateway.c          |    6 +-
>  sheep/object_cache.c     |   99 ++++---
>  sheep/ops.c              |   28 ++-
>  sheep/sheep_priv.h       |   12 +
>  sheep/vdi.c              |   97 +++++--
>  sheepfs/volume.c         |   64 ++++-
>  tests/functional/077     |   55 ++++
>  tests/functional/077.out |   11 +
>  tests/functional/group   |    1 +
>  18 files changed, 1310 insertions(+), 198 deletions(-)
>  create mode 100644 lib/sd_inode.c
>  create mode 100755 tests/functional/077
>  create mode 100644 tests/functional/077.out

Sorry for being away from sd development for a long time.
Unfortunately, I still don't have enough time to give reviews on this
series.

My biggest concern is backward compatibility.  If this series can pass
our functionality tests, please merge it and move forward after
addressing Yuan's comments.

I think of rebasing my object reclaim patch on top of this seris
before releasing sheepdog 0.8.0, but any other suggestions would be
welcome.

Thanks,

Kazutaka



More information about the sheepdog mailing list