[sheepdog] [PATCH 1/4] sheep: don't trim before calculating sha1
Liu Yuan
namei.unix at gmail.com
Thu Jul 18 08:49:54 CEST 2013
On Thu, Jul 18, 2013 at 03:39:46PM +0900, MORITA Kazutaka wrote:
> At Tue, 16 Jul 2013 17:30:18 +0800,
> Liu Yuan wrote:
> >
> > Trim the object before getting sha1 don't give us much benefit because
> > 1. Most of objects can't be trimmed
>
> We have a plan to include the object reclaim patchset which introduces
> many sparse objects (ledger objects, deleted vdi objects), no?
>
Ah, I didn't think of this situation. I thought we wouldn't have many sparse
objects in the production environment.
> > 2. Require farm to trim the object again
> > - need malloc() tmp space for the trim
>
> We can do memmove() outside of trim_zero_blocks(). Then, we can avoid malloc()
> for the trim operation when we don't want to update the buffer.
>
> >
> > This is all about sha1ing more bytes vs triming the object, which is faster.
> > Both will take cpu cycles and no big win one over another.
>
> Please give us a benchmark result before doing this kinds of changes.
> On my environment (Intel Core i7-3930K CPU 3.20 GHz), there was a big
> difference.
>
> * benchmark program
>
> int main(int argc, char **argv)
> {
> static unsigned char buf[SD_DATA_OBJ_SIZE] = {};
> int cnt;
> uint64_t offset;
> uint32_t len;
> unsigned char sha1[SHA1_DIGEST_SIZE];
> struct sha1_ctx c;
>
> cnt = atoi(argv[1]);
> if (strcmp(argv[2], "trim") == 0) {
> for (int i = 0; i < cnt; i++) {
> offset = 0;
> len = 0;
> trim_zero_blocks(buf, &offset, &len);
>
> sha1_init(&c);
> sha1_update(&c, buf, 0);
> sha1_final(&c, sha1);
> }
> } else if (strcmp(argv[2], "sha1") == 0) {
> for (int i = 0; i < cnt; i++) {
> sha1_init(&c);
> sha1_update(&c, buf, sizeof(buf));
> sha1_final(&c, sha1);
> }
> }
> return 0;
> }
>
> * result
>
> $ time ./sheep/sheep 10000 trim
>
> real 0m0.013s
> user 0m0.012s
> sys 0m0.000s
>
> $ time ./sheep/sheep 10000 sha1
>
> real 1m59.807s
> user 1m59.799s
> sys 0m0.004s
>
>
> This means that calculating 10,000 objects causes 2 minutes overhead.
Okay, your test said well enough. trimming is obviously faster than hashing. If
we have many sparse objects, we can benefit it a lot. Please drop this patch.
I think you can apply other 3 patches cleanly.
Thanks
Yuan
More information about the sheepdog
mailing list