I found the bottle neck of my testing cluster: cpu. Cache is enabled on an ssd. Both qemu and sheep uses lot's of cpu. dd NOT using oflag=direct qemu 81% sheep 49% write speed ~150 M/s dd using oflag=direct qemu 28% sheep 69% write speed ~80 M/s What do you think about it? dd on the host uses ~ 60% cpu and reaches 270 M/s (without oflag=direct) |