[sheepdog] [PATCH 1/4] sheep: don't trim before calculating sha1

Thu Jul 18 08:39:46 CEST 2013

At Tue, 16 Jul 2013 17:30:18 +0800,
Liu Yuan wrote:
> 
> Trim the object before getting sha1 don't give us much benefit because
> 1. Most of objects can't be trimmed

We have a plan to include the object reclaim patchset which introduces
many sparse objects (ledger objects, deleted vdi objects), no?

> 2. Require farm to trim the object again
>    - need malloc() tmp space for the trim

We can do memmove() outside of trim_zero_blocks().  Then, we can avoid malloc()
for the trim operation when we don't want to update the buffer.

> 
> This is all about sha1ing more bytes vs triming the object, which is faster.
> Both will take cpu cycles and no big win one over another.

Please give us a benchmark result before doing this kinds of changes.
On my environment (Intel Core i7-3930K CPU 3.20 GHz), there was a big
difference.

* benchmark program

int main(int argc, char **argv)
{
	static unsigned char buf[SD_DATA_OBJ_SIZE] = {};
	int cnt;
	uint64_t offset;
	uint32_t len;
	unsigned char sha1[SHA1_DIGEST_SIZE];
	struct sha1_ctx c;

	cnt = atoi(argv[1]);
	if (strcmp(argv[2], "trim") == 0) {
		for (int i = 0; i < cnt; i++) {
			offset = 0;
			len = 0;
			trim_zero_blocks(buf, &offset, &len);

			sha1_init(&c);
			sha1_update(&c, buf, 0);
			sha1_final(&c, sha1);
		}
	} else 	if (strcmp(argv[2], "sha1") == 0) {
		for (int i = 0; i < cnt; i++) {
			sha1_init(&c);
			sha1_update(&c, buf, sizeof(buf));
			sha1_final(&c, sha1);
		}
	}
	return 0;
}

* result

$ time ./sheep/sheep 10000 trim

real    0m0.013s
user    0m0.012s
sys     0m0.000s

$ time ./sheep/sheep 10000 sha1

real    1m59.807s
user    1m59.799s
sys     0m0.004s

This means that calculating 10,000 objects causes 2 minutes overhead.

Thanks,

Kazutaka