[sheepdog] [PATCH 9/9] sheep: show error message when object may be lost

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue May 7 09:43:09 CEST 2013


At Tue, 07 May 2013 15:13:12 +0800,
Liu Yuan wrote:
> 
> +		case SD_RES_NO_OBJ:
> +			/*
> +			 * No object means that there was no write success at
> +			 * this epoch.
> +			 */
> +			data_lost = false;
> +			/* fall through */
> 
> So if A, B, C all return SD_RES_NO_OBJ, you set data_lost = false, in
> this case, we don't print an error, no?

I set false to data_lost even when only one of nodes returns
SD_RES_NO_OBJ.

Write requests are successful only when all the replicas are updated.
This means that if there is a node who returns SD_RES_NO_OBJ, we can
guarantee that no write requests were succeeded at the epoch and we
can safely use the older replicas.

For example,

 Epoch  Nodes
 1      [A, B, C, D]        <- A, B, and C has the object X.
 2      [A, B, C, D, E]     <- B, C, and E are in charge of X, but E doesn't recover
                               X yet.
 3      [A, C, D, E]
 4      [A, D, E]           <- B and C have gone away at epoch 2

In this case,

 - A tries to recover X from C, D, and E at epoch 3 first, but no
   object is recovered at epoch 3.  C, D, and E return SD_RES_NO_OBJ
   and we can safely try the older epoch.

 - A tries to recover X from B, C, and E at epoch 2.  A cannot connect
   to B and C, and E returns SD_RES_NO_OBJ.  In this case, no need to
   consider that X was updated at epoch 2 because if it was updated
   from X to X', E must have X'.

 - Now A can safely read X from A, B, or C at epoch 1.

Thanks,

Kazutaka



More information about the sheepdog mailing list