[sheepdog-users] High cpu usage by sheep

Fri Jan 17 12:26:24 CET 2014

W dniu 16.01.2014 15:22, Liu Yuan pisze:
> On Thu, Jan 16, 2014 at 02:05:20PM +0100, Marcin Mirosław wrote:
>> W dniu 16.01.2014 07:06, Hitoshi Mitake pisze:
>>> Marcin Mirosław wrote:
>>>>
>>>> Hi!
>>>> Today I run `dog cluster check` and I saw high cpu utilization on both
>>>> cores, both by kernel thread and userland. It took 30seconds to check
>>>> 5GB of data, it gives about 170MB/s.
>>>> Next i tested `dog vdi check testowy`, again speed was limited by CPU.
>>>> It took 17s so it gives speed about 300MB/s.
>>>
>>> Hi Marcin, sorry for my late reply.
>>
>> Hi Hitoshi!
>> It's no problem, it's absolutely no urgent question.
>>
>>> Currently I cannot say something about the above result (it is too
>>> slow or not). But basically vdi checking is not light operation
>>> because it detect majority of every replica of every object of
>>> VDI. Basic scheme is like below:
>>>
>>> 1. read sha1 value of every replica on an index (if copies ==3, 3
>>>    hashed values are read)
>>> 2. compare and decide majority. if there are broken replicas, dog
>>>    recovers them with correct one
>>> 3. increment the index, goto 1, until every replica is checked
>>>
>>>>
>>>> So I was wonder where is bottleneck? I used tool called "perf", I didn't
>>>> ever use this tool. I hope i use it correctly. I'll attach output of
>>>> perf on the end of email. I'd like to ask you, is there place to do some
>>>> optimization or not?
>>>
>>> I think caliculation of sha1 value can be optimized. get_buffer_sha1()
>>> in lib/sha1.c is the function for it. In the above checking scheme,
>>> copies * (a number of objects) of sha1 caliculation is
>>> done (exception: read only objects would have their own static value,
>>> so we need to caliculate for them only once at first
>>> time). Previously, we have an optimized function but it was buggy so
>>> we replaced it with safe but naive one.
>>>
>>> If you can write an optimized version with modern instruction set, it
>>> is definitely welcome :)
>>
>> I can't even write "hello world" in C/C++ without looking into wiki
>> (http://en.wikipedia.org/wiki/Hello_world_program) :/. I can search web
>> for other implementation of sha1 but I can't do benchmarks. I found
>> http://nayuki.eigenstate.org/page/fast-sha1-hash-implementation-in-x86-assembly
>> , there is a simple benchmark tool. I found also thread
>> http://git.661346.n2.nabble.com/Linus-sha1-is-much-faster-td3448007.html
>> but I have enough skills to compare other implementations with those
>> from first link and with implementation used in sheepdog.
>> So worse speed of creating sha1 can strongly depends on version of
>> compilator and optimalizations flags. I think if other implementation of
>> sha1 aren't faster than ~20 procent then there is no reason to touch it.
>> Hmm, what about in kernel engine (CRYPTO_SHA1 and CRYPTO_SHA1_SSSE3),
>> can faster than used in sheepdog? What can be disadvantages of using
>> kernel engine? With zero-copy there shouldn't be high overhead.
> 
> I already port kernel hardware acceleration to sheepdog, see lib/sha1_ssse3.S.
> For x86 which support hardware assisted acceleration, we already take advantage
> of it.

Hi!
So it looks sha1 can't be calculate much faster.
Thank you for information!
Marcin