[sheepdog-users] Recovery Performance Question

Wed Sep 18 03:53:28 CEST 2013

At Thu, 12 Sep 2013 06:51:11 +0200,
Gerald Richter - ECOS wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; windows-1252 (quoted-printable)>]
> Hi,
> 
> Â> I have about 500GB in my test cluster. Everything works fine. Now I have killed a node and restarted it. In this time no changes were made on any node.
> 
> Â> As I expected the restarted node starts a recovery. I can still access all data on the restarted node and all other node. So everything is fine for all vmâ€™s.
> 
> Â> But what I discovered is that the recovery of the 500GB has taken
> about 6 hours and I guess that if the second node fails (my test
> cluster only has two nodes) during this 6 hours, I cannot access
> data anymore, until at least one node has recovered. 

Concurrent failure will not be a problem when a number of redundancy
is larger than the number of failures.

> 
> Â> So my questions is what data is really moved during recovery. I understand that all object are checked and moved from the old epoch to the new one, but since no data has changed, it should be enough to move some kind of pointer and not really coping the data. From the time it takes it seems more like all data is really moved/copied?

Sheep let requests for objects in recovery state be pending. After the
objects are recovered, the requests are processed. So invalid data
lost and corruption will not happen.

But as you say, current slowness of recovery process is a problem. I
have a work-in-progress local change for improving it. But making
recovery process fast in a correct manner is not so easy because sheep
must not block ordinal requests from VMs by the traffic of recovery.

I'd like to post the patches in the near future.

Thanks,
Hitoshi