[Sheepdog] [PATCH v3 1/7] sheep: add transient failure detection
Liu Yuan
namei.unix at gmail.com
Thu May 3 10:37:19 CEST 2012
On 05/03/2012 04:29 PM, HaiTing Yao wrote:
> Epoch Nodes
> 1 A, B, C, D
> 2 A, B, C <- node D fails temporally
> 3 A, B, C, D
>
> If object recovery doesn't run at epoch 2, there is no object move
> between nodes. I know how to handle transient network partition is a
> challenging problem with the current implementation, but I'd like to
> see another approach which doesn't block I/Os for a long time.
>
>
> From my test, the recovery has began running when epoch 3 comes usually.
I think it depends how soon the node comes back. If in a window that is
succeeding recovery supersedes the previous one, there isn't any object
migration overhead.
Thanks,
Yuan
More information about the sheepdog
mailing list