[sheepdog-users] High cpu usage by sheep (1/5)

Thu Jan 16 14:05:20 CET 2014

W dniu 16.01.2014 07:06, Hitoshi Mitake pisze:
> Marcin Mirosław wrote:
>> 
>> Hi!
>> Today I run `dog cluster check` and I saw high cpu utilization on both
>> cores, both by kernel thread and userland. It took 30seconds to check
>> 5GB of data, it gives about 170MB/s.
>> Next i tested `dog vdi check testowy`, again speed was limited by CPU.
>> It took 17s so it gives speed about 300MB/s.
> 
> Hi Marcin, sorry for my late reply.

Hi Hitoshi!
It's no problem, it's absolutely no urgent question.

> Currently I cannot say something about the above result (it is too
> slow or not). But basically vdi checking is not light operation
> because it detect majority of every replica of every object of
> VDI. Basic scheme is like below:
> 
> 1. read sha1 value of every replica on an index (if copies ==3, 3
>    hashed values are read)
> 2. compare and decide majority. if there are broken replicas, dog
>    recovers them with correct one
> 3. increment the index, goto 1, until every replica is checked
> 
>> 
>> So I was wonder where is bottleneck? I used tool called "perf", I didn't
>> ever use this tool. I hope i use it correctly. I'll attach output of
>> perf on the end of email. I'd like to ask you, is there place to do some
>> optimization or not?
> 
> I think caliculation of sha1 value can be optimized. get_buffer_sha1()
> in lib/sha1.c is the function for it. In the above checking scheme,
> copies * (a number of objects) of sha1 caliculation is
> done (exception: read only objects would have their own static value,
> so we need to caliculate for them only once at first
> time). Previously, we have an optimized function but it was buggy so
> we replaced it with safe but naive one.
> 
> If you can write an optimized version with modern instruction set, it
> is definitely welcome :)

I can't even write "hello world" in C/C++ without looking into wiki
(http://en.wikipedia.org/wiki/Hello_world_program) :/. I can search web
for other implementation of sha1 but I can't do benchmarks. I found
http://nayuki.eigenstate.org/page/fast-sha1-hash-implementation-in-x86-assembly
, there is a simple benchmark tool. I found also thread
http://git.661346.n2.nabble.com/Linus-sha1-is-much-faster-td3448007.html
but I have enough skills to compare other implementations with those
from first link and with implementation used in sheepdog.
So worse speed of creating sha1 can strongly depends on version of
compilator and optimalizations flags. I think if other implementation of
sha1 aren't faster than ~20 procent then there is no reason to touch it.
Hmm, what about in kernel engine (CRYPTO_SHA1 and CRYPTO_SHA1_SSSE3),
can faster than used in sheepdog? What can be disadvantages of using
kernel engine? With zero-copy there shouldn't be high overhead.

I notice now I didn't describe second attachment correctly
(vdi_read_perf.txt). In second paragraph I was testing command `dog vdi
_read_ testowy >/dev/null` not 'vdi check'.

Checksum is only tested while checking vdi/cluster not while normal reads?

Thank you!