[Sheepdog] [PATCH v5 06/17] farm: add sha1_file operations

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Wed Jan 4 22:04:30 CET 2012


At Fri, 30 Dec 2011 21:07:01 +0800,
Liu Yuan wrote:
> 
> From: Liu Yuan <tailai.ly at taobao.com>
> 
> All the objects(snap, trunk, data) in the farm is based on the
> operations of the sha1_file.
> 
> sha1_file provide us some useful features:
> 
> - Regardless of object type, all objects are all in deflated with zlib,
>   and have a header that not only specifies their tag, but also size
>   information about the data in the object.  It's worth noting that the
>   SHA1 hash that is used to name the object is always the hash of this
>   _compressed_ object, not the original data.

We can consider two options here:

 a) calculate SHA1 after compressing the input data
 b) calculate SHA1 directly

I guess you assume that a) is faster than b), but it's not obvious to
me.

In addition, do we really need to calculate SHA1 of the content?  If
there are many data objects, updating epoch would take too much time.

> 
> - the general consistency of an object can always be tested independently
>   of the contents or the type of the object: all objects can be validated
>   by verifying that
> 	(a) their hashes match the content of the file and
>   	(b) the object successfully inflates to a stream of bytes that
>   	forms a sequence of <sha1_file_hdr>  + <binary object data>
> 
> Signed-off-by: Liu Yuan <tailai.ly at taobao.com>
> ---
>  sheep/farm.h           |   41 +++++++
>  sheep/farm/sha1_file.c |  298 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 339 insertions(+), 0 deletions(-)
>  create mode 100644 sheep/farm.h
>  create mode 100644 sheep/farm/sha1_file.c
> 
> diff --git a/sheep/farm.h b/sheep/farm.h
> new file mode 100644
> index 0000000..a07928f
> --- /dev/null
> +++ b/sheep/farm.h
> @@ -0,0 +1,41 @@
> +#ifndef FARM_H
> +#define FARM_H
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <memory.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <errno.h>
> +#include <sys/mman.h>
> +#include <linux/limits.h>
> +#include <openssl/sha.h>
> +#include <zlib.h>
> +
> +#include "sheepdog_proto.h"
> +#include "sheep.h"
> +#include "logger.h"
> +
> +#define SHA1_LEN        20
> +#define HEX_LEN         40
> +#define NAME_LEN        HEX_LEN
> +
> +struct sha1_file_hdr {
> +	char tag[TAG_LEN];
> +	uint64_t size;
> +	uint64_t priv;
> +	uint64_t reserved;
> +};
> +
> +/* sha1_file.c */
> +extern char *sha1_to_path(const unsigned char *sha1);
> +extern int sha1_file_write(unsigned char *buf, unsigned len, unsigned char *outsha1);
> +extern void * sha1_file_read(const unsigned char *sha1, struct sha1_file_hdr *hdr);
> +extern char * sha1_to_hex(const unsigned char *sha1);
> +extern int get_sha1_hex(const char *hex, unsigned char *sha1);
> +extern int sha1_file_delete(const unsigned char *sha1);
> +
> +#endif
> diff --git a/sheep/farm/sha1_file.c b/sheep/farm/sha1_file.c
> new file mode 100644
> index 0000000..bb4cf52
> --- /dev/null
> +++ b/sheep/farm/sha1_file.c
> @@ -0,0 +1,298 @@
> +/*
> + * Copyright (C) 2011 Taobao Inc.
> + *
> + * Liu Yuan <namei.unix at gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +/*
> + *   sha1_file provide us some useful features:
> + *
> + *   - Regardless of object type, all objects are all in deflated with zlib,
> + *     and have a header that not only specifies their tag, but also size
> + *     information about the data in the object.  It's worth noting that the
> + *     SHA1 hash that is used to name the object is always the hash of this
> + *     _compressed_ object, not the original data.
> + *
> + *   - the general consistency of an object can always be tested independently
> + *     of the contents or the type of the object: all objects can be validated
> + *     by verifying that
> + *       (a) their hashes match the content of the file and
> + *       (b) the object successfully inflates to a stream of bytes that
> + *       forms a sequence of <sha1_file_hdr>  + <binary object data>
> + */
> +#include <sys/types.h>
> +#include <sys/xattr.h>
> +
> +#include "farm.h"
> +#include "util.h"
> +
> +static inline char *get_object_directory(void)
> +{
> +	return farm_obj_dir;
> +}
> +
> +static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
> +{
> +	int i;
> +	for (i = 0; i < SHA1_LEN; i++) {
> +		static char hex[] = "0123456789abcdef";
> +		unsigned int val = sha1[i];
> +		char *pos = pathbuf + i*2 + (i > 0);
> +		*pos++ = hex[val >> 4];
> +		*pos = hex[val & 0xf];
> +	}
> +}
> +
> +char *sha1_to_path(const unsigned char *sha1)
> +{
> +
> +	static char buf[PATH_MAX];
> +	const char *objdir;
> +	int len;
> +
> +	objdir = get_object_directory();
> +	len = strlen(objdir);
> +
> +	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
> +	memcpy(buf, objdir, len);
> +	buf[len] = '/';
> +	buf[len+3] = '/';
> +	buf[len+42] = '\0';

Avoid magic numbers.

Thanks,

Kazutaka



More information about the sheepdog mailing list