[sheepdog-users] node recovery performance

Valerio Pachera sirio81 at gmail.com
Tue Nov 26 11:49:55 CET 2013


2013/11/25 Gerald Richter - ECOS <richter at ecos.de>

> Hi,
>
> I have a simple test cluster with two nodes and one vdi with 26GB. If I
> restart one node recovery takes 7,5 minutes. Even there were no vm running
> in this time, so nothing is change inside the cluster, but the recovery
> node seem to pull all the data of the vid from the other node, even it has
> all the data already on the local disk.
>

It doesn't pool data from the other node if not necessary.
What it does it to checksum all objects in the node you have restarted.
If any object is missing, it will copy it from the other nodes.

This operation takes time even if it doesn't create network traffic.


>
> So I expect a cluster of 2,6GB will take 750 minutes, which is half a day.


Did you mean Tera?



> If the second server fails in this time, data might be lost. So doing a
> reboot of two servers within half a day might cause data loss... (it's same
> for three or more nodes, only the timeframe changes a little bit).
>

If the second powers off, nothing happens different of a "standard" server
that gets powered off.
When you turned off the first server, no recovery has occurred.
The second server just continued working with 1 copy per object.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20131126/98f24f9d/attachment-0005.html>


More information about the sheepdog-users mailing list