[Sheepdog] [PATCH 0/6] merge sheep and puppy into one sheepdog daemon, collie

Fri Dec 4 13:32:58 CET 2009

FUJITA Tomonori <fujita.tomonori at lab.ntt.co.jp> writes:

> We need the data of a single write system call to be applied to a
> super object in the "all or nothing" way (I assume BTRFS does but it
> does not?).

Ah right, okay, I understand. I wasn't aware that btrfs gave that guarantee
without BTRFS_IOC_TRANS_START, but looking at the btrfs code, I think you're
absolutely right that it does. The copy-on-write implementation of metadata
and data makes it quite hard for it to do otherwise, I imagine. Nice
property, anyway.

[For the benefit of the list archives, the other 'special feature' of the
filesystem we're using at the moment is setting the user.sheepdog.copies
xattr on the object files, which is why the fs needs to have extended
attributes enabled.]

> I guess that it takes some time until BTRFS matures so we've been thinking
> about other options. One is using Berkeley DB for a super object.

A random off-the-wall suggestion: I wonder if it would be possible to use a
filesystem directory tree to store the catalogue information instead of a
single large database file or the current large block file. rename(2),
link(2) and even symlink(2) are atomic on all POSIX filesystems, and are
presumably optimised to be reasonably fast (?). If instead of overwriting
part of a large file, we wrote a new tiny file and then move() it over the
top of the original tiny file, we get atomic behaviour for free pretty much
everywhere.

I'm not sure how well the sheepdog catalogue fits into such a scheme though,
or whether this would perform better or worse than the current approach.

Best wishes,

Chris.