[Sheepdog] sheepdog's recovery alogorithm questions

MORITA Kazutaka morita.kazutaka at gmail.com
Tue Mar 15 11:14:11 CET 2011


On Tue, Mar 15, 2011 at 12:09 PM, jidalyg_8711 <jidalyg_8711 at 163.com> wrote:
> 1. sheepdog claim it is strong consistent, And I think the implemention of
> the read_object() write_object() remove_object()  ensure that strong
> consistent?  any other places to ensure it ?   How about it affect the
> performance of the sheepdog?

The mechanism to ensure data consistency are:

- Sheepdog stores objects in the epoch number directory, and doesn't
  allow clients to read objects from old epoch number directories.
  This prevents clients from reading old data which are not

- Sheepdog allows write requests from at most one client.
  Administrators need to pay attention for this when using Sheepdog.
  This avoids write conflicts and ensures object consistency easily.

> 2. About the recovery alogorithm, When the new node join  or the node
> left,  sheepdog will call start_recovery()func and recovery in background.
> The main actions of the recovery is move the object to the new node and the
> epoch directory ; while recoverying, the READ_OBJECT request arrived,
> currently sys->epoch has increased, but the object maybe still in old epoch
> directory ,not yet move to the new epoch, How the sheepdog handle the
> situation ?  And the sys->epoch increased in update_cluster_status()
> before calling start_recovery()

Before processing the object request, Sheepdog checks whether the
requesting object is recovered to the current epoch directory or not
in is_recoverying_oid().  If the object is not recovered yet, Sheepdog
recovers the object first, and after that, processes the request.



More information about the sheepdog mailing list