At Thu, 05 Jan 2012 11:51:37 +0800, Liu Yuan wrote: > > On 01/05/2012 05:04 AM, MORITA Kazutaka wrote: > > > We can consider two options here: > > > > a) calculate SHA1 after compressing the input data > > b) calculate SHA1 directly > > > > I guess you assume that a) is faster than b), but it's not obvious to > > me. > > > I was wrong, I wrote a test and it shows that it's a) slower than b), > but we would gain storage reduction from the compressed content. > > I simply copied that sha1_file_write() does in the test file. The result > is bellow, for a object 4M with "010101010101..." content which is ideal > for compression. > > tailai.ly at taobao:~$ time ./a.out 1 > sheepdog/store/0/obj/00000001/007c2b2500000001 > > real 0m0.062s > user 0m0.060s > sys 0m0.000s > tailai.ly at taobao:~$ ll -h store > -rw-r--r-- 1 tailai.ly tailai.ly 4.0K 2012-01-05 11:36 store > tailai.ly at taobao:~$ time ./a.out 0 > sheepdog/store/0/obj/00000001/007c2b2500000001 > > real 0m0.036s > user 0m0.030s > sys 0m0.000s > tailai.ly at taobao:~$ ll -h store > -rw-r--r-- 1 tailai.ly tailai.ly 4.0M 2012-01-05 11:37 store > > it shows that we get a good size reduction, trading off the CPU cycles. > I also tried vmlinuxz (4.2M) on my laptop, and compressed one is 10x > slower than the uncompressed.for such a object with random content, we > both burn cycles and lose size reduction. > > With this numbers, I am not sure which one, either cpu cycles and > storage space, is more important than other. I am okay to drop > compressed logic. IMHO, this kind of feature should be an optional one and the default behavior should be no compression. I think it is better to drop the compression feature in this start version. Thanks, Kazutaka |