[Sheepdog] object sizes
Christoph Hellwig
hch at infradead.org
Fri Nov 11 13:59:54 CET 2011
Currently sheepdog uses a fixed object size of 4 Megabytes. For getting
good performance out of spinning media this relatively small sizes has
a few issues:
- disks need 64 to 128 contiguous megabytes to make the seek penality
invisible during normal streaming workloads. While at least XFS
is fairly good at laying out multiple sheepdog objects contiguously
for a single writer we still get occasinal metadata allocation
inbetween. The situation is much worse if we have multiple parallell
writers and they hit the same allocation group.
- there is a non-trivial metadata overhead, e.g. for a 1GB streaming
write to a sheepdog volume we need to allocate 256 inodes, and flush
them out before returning to the caller with the current write
through cache model, which all cause seeks.
To fix this I'd love to see the option of an object size ~128MB. There
are two issues with that:
- we will use up more space if randomly written into the volume.
For most cloud setups that this is entirely acceptable, though.
- we need to copy a lot of data when processing copy on write requests.
The latter is more of a concern to me. I can think of two mitigation
strategies:
- make the COW block size smaller than the object size. This is
similar to the subclusters under development for qcow2. This
could e.g. be implemented by an extended attribute on the file
containing a bitmap of regions in an object that haven't been
copied up yet.
- make use features in some filesystem to create a new file that
shares the data with an existing file, aka reflinks in ocfs2
and btrfs.
Did anyone look into larger object sizes?
More information about the sheepdog
mailing list