[Sheepdog] [PATCH v3 1/7] sheep: add transient failure detection
    Liu Yuan 
    namei.unix at gmail.com
       
    Thu May  3 10:37:19 CEST 2012
    
    
  
On 05/03/2012 04:29 PM, HaiTing Yao wrote:
>      Epoch  Nodes
>         1  A, B, C, D
>         2  A, B, C       <- node D fails temporally
>         3  A, B, C, D
> 
>     If object recovery doesn't run at epoch 2, there is no object move
>     between nodes.  I know how to handle transient network partition is a
>     challenging problem with the current implementation, but I'd like to
>     see another approach which doesn't block I/Os for a long time.
> 
>  
> From my test, the recovery has began running when epoch 3 comes usually.
I think it depends how soon the node comes back. If in a window that is
succeeding recovery supersedes the previous one, there isn't any object
migration overhead.
Thanks,
Yuan
    
    
More information about the sheepdog
mailing list