[Sheepdog] object sizes
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Fri Nov 11 16:22:11 CET 2011
At Fri, 11 Nov 2011 07:59:54 -0500,
Christoph Hellwig wrote:
>
> Currently sheepdog uses a fixed object size of 4 Megabytes. For getting
> good performance out of spinning media this relatively small sizes has
> a few issues:
>
> - disks need 64 to 128 contiguous megabytes to make the seek penality
> invisible during normal streaming workloads. While at least XFS
> is fairly good at laying out multiple sheepdog objects contiguously
> for a single writer we still get occasinal metadata allocation
> inbetween. The situation is much worse if we have multiple parallell
> writers and they hit the same allocation group.
> - there is a non-trivial metadata overhead, e.g. for a 1GB streaming
> write to a sheepdog volume we need to allocate 256 inodes, and flush
> them out before returning to the caller with the current write
> through cache model, which all cause seeks.
>
> To fix this I'd love to see the option of an object size ~128MB. There
> are two issues with that:
>
> - we will use up more space if randomly written into the volume.
> For most cloud setups that this is entirely acceptable, though.
> - we need to copy a lot of data when processing copy on write requests.
>
> The latter is more of a concern to me. I can think of two mitigation
> strategies:
>
> - make the COW block size smaller than the object size. This is
> similar to the subclusters under development for qcow2. This
> could e.g. be implemented by an extended attribute on the file
> containing a bitmap of regions in an object that haven't been
> copied up yet.
> - make use features in some filesystem to create a new file that
> shares the data with an existing file, aka reflinks in ocfs2
> and btrfs.
Currently, Yuan is trying to add some features to Sheepdog store, and
I heard that he'll send the patch next week.
Yuan, something like cluster-wide reflink will be supported in the
framework you are implementing?
Thanks,
Kazutaka
>
> Did anyone look into larger object sizes?
> --
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
More information about the sheepdog
mailing list