[sheepdog-users] High cpu usage by sheep (1/5)

Liu Yuan namei.unix at gmail.com
Thu Jan 16 15:22:08 CET 2014


On Thu, Jan 16, 2014 at 02:05:20PM +0100, Marcin Mirosław wrote:
> W dniu 16.01.2014 07:06, Hitoshi Mitake pisze:
> > Marcin Mirosław wrote:
> >> 
> >> Hi!
> >> Today I run `dog cluster check` and I saw high cpu utilization on both
> >> cores, both by kernel thread and userland. It took 30seconds to check
> >> 5GB of data, it gives about 170MB/s.
> >> Next i tested `dog vdi check testowy`, again speed was limited by CPU.
> >> It took 17s so it gives speed about 300MB/s.
> > 
> > Hi Marcin, sorry for my late reply.
> 
> Hi Hitoshi!
> It's no problem, it's absolutely no urgent question.
> 
> > Currently I cannot say something about the above result (it is too
> > slow or not). But basically vdi checking is not light operation
> > because it detect majority of every replica of every object of
> > VDI. Basic scheme is like below:
> > 
> > 1. read sha1 value of every replica on an index (if copies ==3, 3
> >    hashed values are read)
> > 2. compare and decide majority. if there are broken replicas, dog
> >    recovers them with correct one
> > 3. increment the index, goto 1, until every replica is checked
> > 
> >> 
> >> So I was wonder where is bottleneck? I used tool called "perf", I didn't
> >> ever use this tool. I hope i use it correctly. I'll attach output of
> >> perf on the end of email. I'd like to ask you, is there place to do some
> >> optimization or not?
> > 
> > I think caliculation of sha1 value can be optimized. get_buffer_sha1()
> > in lib/sha1.c is the function for it. In the above checking scheme,
> > copies * (a number of objects) of sha1 caliculation is
> > done (exception: read only objects would have their own static value,
> > so we need to caliculate for them only once at first
> > time). Previously, we have an optimized function but it was buggy so
> > we replaced it with safe but naive one.
> > 
> > If you can write an optimized version with modern instruction set, it
> > is definitely welcome :)
> 
> I can't even write "hello world" in C/C++ without looking into wiki
> (http://en.wikipedia.org/wiki/Hello_world_program) :/. I can search web
> for other implementation of sha1 but I can't do benchmarks. I found
> http://nayuki.eigenstate.org/page/fast-sha1-hash-implementation-in-x86-assembly
> , there is a simple benchmark tool. I found also thread
> http://git.661346.n2.nabble.com/Linus-sha1-is-much-faster-td3448007.html
> but I have enough skills to compare other implementations with those
> from first link and with implementation used in sheepdog.
> So worse speed of creating sha1 can strongly depends on version of
> compilator and optimalizations flags. I think if other implementation of
> sha1 aren't faster than ~20 procent then there is no reason to touch it.
> Hmm, what about in kernel engine (CRYPTO_SHA1 and CRYPTO_SHA1_SSSE3),
> can faster than used in sheepdog? What can be disadvantages of using
> kernel engine? With zero-copy there shouldn't be high overhead.

I already port kernel hardware acceleration to sheepdog, see lib/sha1_ssse3.S.
For x86 which support hardware assisted acceleration, we already take advantage
of it.

Thanks
Yuan



More information about the sheepdog-users mailing list