[Sheepdog] [PATCH v3 1/7] sheep: add transient failure detection

Liu Yuan namei.unix at gmail.com
Thu May 3 10:37:19 CEST 2012


On 05/03/2012 04:29 PM, HaiTing Yao wrote:

>      Epoch  Nodes
>         1  A, B, C, D
>         2  A, B, C       <- node D fails temporally
>         3  A, B, C, D
> 
>     If object recovery doesn't run at epoch 2, there is no object move
>     between nodes.  I know how to handle transient network partition is a
>     challenging problem with the current implementation, but I'd like to
>     see another approach which doesn't block I/Os for a long time.
> 
>  
> From my test, the recovery has began running when epoch 3 comes usually.


I think it depends how soon the node comes back. If in a window that is
succeeding recovery supersedes the previous one, there isn't any object
migration overhead.

Thanks,
Yuan



More information about the sheepdog mailing list