[sheepdog] [PATCH v2 3/6] farm: trim data before calculating sha1

Liu Yuan namei.unix at gmail.com
Sat Jun 8 07:54:50 CEST 2013


On 06/08/2013 10:02 AM, Kai Zhang wrote:
> sha1.c trims data before calculating sha1.
> So that collie and sheep use the same algorithm to calculate sha1 value.

Trim objects will result in faster sha1 for now, but I guess later we
have to drop trim for finer unit store, that is map one objects into
multiple smaller fixed units to get a better de-duplication. I am
thinking of following scheme:

 0 suppose we store object into 128K chunks
 1 we get a sha1 of this object, which can be cut into 32 * 128K chunks
 2 get sha1 of every chunks and store these chunks into sha1_file
 3 store these 32 sha1 values into sha1_file of this object
 4 so load this object is split into 32 operations to read sha1_file of
all the chunks, indexed by sha1_file of this object.

We can even compress chunks before writing sha1_file to get smaller
space. 128k can be another value like 256k or 512k, to get best balance
of space and time before we choose one as the best value.

Thanks,
Yuan



More information about the sheepdog mailing list