[Sheepdog] [PATCH v4 03/12] farm: add sha1_file operations

Liu Yuan namei.unix at gmail.com
Wed Dec 28 14:50:53 CET 2011


On 12/28/2011 09:19 PM, Christoph Hellwig wrote:

> On Sun, Dec 25, 2011 at 11:42:49PM +0800, Liu Yuan wrote:
>> From: Liu Yuan <tailai.ly at taobao.com>
>>
>> All the objects(snap, trunk, data) in the farm is based on the
>> operations of the sha1_file.
>>
>> sha1_file provide us some useful features:
>>
>> - Regardless of object type, all objects are all in deflated with zlib,
>>   and have a header that not only specifies their tag, but also size
>>   information about the data in the object.  It's worth noting that the
>>   SHA1 hash that is used to name the object is always the hash of this
>>   _compressed_ object, not the original data.
> 
> why?


Compression is supposed to reduce the storage usage for snapshot
objects. And I assume hashing the compressed content is faster than the
original one.

If not, any benefits?

> 
>>
>> - the general consistency of an object can always be tested independently
>>   of the contents or the type of the object: all objects can be validated
>>   by verifying that
>> 	(a) their hashes match the content of the file and
>>   	(b) the object successfully inflates to a stream of bytes that
>>   	forms a sequence of <sha1_file_hdr>  + <binary object data>
> 
> These comments should not only be in the commit message but also in
> the source code.
> 


I have written it in doc/farm-internals.txt, and it looks a little bit
redundant to include them in the files, isn't it?

>> --- /dev/null
>> +++ b/sheep/farm.h
> 
> Any reason the header is not inside the farm/ subdirectory?
> 
>> +	return farm_obj_dir;
>> +}
>> +
>> +static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
>> +{
>> +	int i;
>> +	for (i = 0; i < SHA1_LEN; i++) {
>> +		static char hex[] = "0123456789abcdef";
> 
> I'd rather have a define for the hex chars, and defined SHA1_LEN
> as strlen(HEX_CHARS) which any recent compiler should optimize to a
> constant expression.
> 


Okay.

>> +	objdir = get_object_directory();
>> +	len = strlen(objdir);
>> +
>> +	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
>> +	if (len + 43 > PATH_MAX)
>> +		panic("insanely long object directory %s", objdir);
> 
> I'd rather check the path length during startup and refuse to start
> sheep instead of aborting later on.

Okay.

>> +	memcpy(buf, objdir, len);
>> +	buf[len] = '/';
>> +	buf[len+3] = '/';
>> +	buf[len+42] = '\0';
>> +	fill_sha1_path(buf + len + 1, sha1);
> 
> Can't we use the strbuf helpers here to make the code readable and
> less fragile?
> 
>> +		setxattr(name, CNAME, &count, CSIZE, 0);
> 
> error handling?

Okay.

>> +	void *map;
>> +
>> +	if (fd < 0) {
>> +		perror(filename);
>> +		return NULL;
>> +	}
>> +	if (fstat(fd, &st) < 0) {
>> +		close(fd);
>> +		return NULL;
>> +	}
>> +	map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
>> +	close(fd);
>> +	if (-1 == (int)(long)map)
> 
> just use a
> 
> 	if (map == MAP_FAILED)
> 
> here.
> 


Thanks for the tip.

Thanks,
Yuan



More information about the sheepdog mailing list