[Sheepdog] object sizes

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Fri Nov 11 16:22:11 CET 2011


At Fri, 11 Nov 2011 07:59:54 -0500,
Christoph Hellwig wrote:
> 
> Currently sheepdog uses a fixed object size of 4 Megabytes.  For getting
> good performance out of spinning media this relatively small sizes has
> a few issues:
> 
>  - disks need 64 to 128 contiguous megabytes to make the seek penality
>    invisible during normal streaming workloads.  While at least XFS
>    is fairly good at laying out multiple sheepdog objects contiguously
>    for a single writer we still get occasinal metadata allocation
>    inbetween.  The situation is much worse if we have multiple parallell
>    writers and they hit the same allocation group.
>  - there is a non-trivial metadata overhead, e.g. for a 1GB streaming
>    write to a sheepdog volume we need to allocate 256 inodes, and flush
>    them out before returning to the caller with the current write
>    through cache model, which all cause seeks.
> 
> To fix this I'd love to see the option of an object size ~128MB.  There
> are two issues with that:
> 
>  - we will use up more space if randomly written into the volume.
>    For most cloud setups that this is entirely acceptable, though.
>  - we need to copy a lot of data when processing copy on write requests.
> 
> The latter is more of a concern to me. I can think of two mitigation
> strategies:
> 
>  - make the COW block size smaller than the object size.  This is
>    similar to the subclusters under development for qcow2.  This
>    could e.g. be implemented by an extended attribute on the file
>    containing a bitmap of regions in an object that haven't been
>    copied up yet.
>  - make use features in some filesystem to create a new file that
>    shares the data with an existing file, aka reflinks in ocfs2
>    and btrfs.

Currently, Yuan is trying to add some features to Sheepdog store, and
I heard that he'll send the patch next week.

Yuan, something like cluster-wide reflink will be supported in the
framework you are implementing?

Thanks,

Kazutaka

> 
> Did anyone look into larger object sizes?
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list