[sheepdog-users] Performance Impact of Recovery

Liu Yuan namei.unix at gmail.com
Wed Mar 12 09:32:31 CET 2014


On Wed, Mar 12, 2014 at 08:47:12AM +0100, richter at ecos.de wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I have about 600GB in my sheepdog cluster. A recovery of one node has
> started today morning and it is at 20% after 2h, so it will still take
> another 8h. I would like to put more vm's in the cluster, but then recovery
> would take even longer.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> During recovery there is a lot of disk read activity on the recovery node
> and on other nodes, because (as far as I understand) hashes for the data
> blocks are computed and compared. This disk activity is so heavy, that other
> vm's that are still running out of qcow2 files, have a notable performance
> impact. 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If I imagine I have more data in the sheepdog cluster and recovery might
> take 2 days, all my vm's running slow for two days, this will not be
> accepted by my users, but sometime a recovery might be necessary (e.g.
> server reboot or sheep crash like today morning).
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> So the question is how can this recovery process speed up? 
> 

We have a patch to speed up recovery

* <efbf7f0> 2014-02-06 [Liu Yuan] sheep/recovery: multi-threading recovery process

which is merged in the master branch. I think this will speed up recovery process
a lot and the more the disks you have, the better speed-up.

> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From my current knowledge (which is not too deep), the only idea would be to
> calculate the data block hashes during storing of the data block and compare
> only stored hashes. Would this be possible/make sense or is there a better
> solution?

We already do it the way you suggested for full replication scheme.

Thanks
Yuan



More information about the sheepdog-users mailing list