[Sheepdog] sheepdog's recovery alogorithm questions

Tue Mar 15 11:14:11 CET 2011

Hi,

On Tue, Mar 15, 2011 at 12:09 PM, jidalyg_8711 <jidalyg_8711 at 163.com> wrote:
> 1. sheepdog claim it is strong consistent, And I think the implemention of
> the read_object() write_object() remove_object()  ensure that strong
> consistent?  any other places to ensure it ?   How about it affect the
> performance of the sheepdog?

The mechanism to ensure data consistency are:

- Sheepdog stores objects in the epoch number directory, and doesn't
  allow clients to read objects from old epoch number directories.
  This prevents clients from reading old data which are not
  up-to-date.

- Sheepdog allows write requests from at most one client.
  Administrators need to pay attention for this when using Sheepdog.
  This avoids write conflicts and ensures object consistency easily.

>
> 2. About the recovery alogorithm, When the new node join  or the node
> left,  sheepdog will call start_recovery()func and recovery in background.
> The main actions of the recovery is move the object to the new node and the
> epoch directory ; while recoverying, the READ_OBJECT request arrived,
> currently sys->epoch has increased, but the object maybe still in old epoch
> directory ,not yet move to the new epoch, How the sheepdog handle the
> situation ?  And the sys->epoch increased in update_cluster_status()
> before calling start_recovery()

Before processing the object request, Sheepdog checks whether the
requesting object is recovered to the current epoch directory or not
in is_recoverying_oid().  If the object is not recovered yet, Sheepdog
recovers the object first, and after that, processes the request.

Thanks,

Kazutaka