[sheepdog] the use case for leave_list/leave_nodes

Liu Yuan namei.unix at gmail.com
Thu Jun 7 04:29:16 CEST 2012


On 06/07/2012 06:38 AM, Christoph Hellwig wrote:

> I'm trying to understand the use case for the leave_list and all code
> associated with it.
> 
> From my reading the intention is to allow a cluster to start as long
> as all the original nodes tried to join the cluster.  What makes an
> original node that tried to join the cluster but failed special over
> one that never tried to join?  It's not going to help us with getting
> copies from it without a manual recover at least.


It is designed for the case that nodes might have different epoch and we
don't know which node has the mastership, for e.g, power failure of
whole cluster at the recovery stage, epoch file corrupted of a
shutdowned cluster. Current code just provide limited support for these
situation and I think we need more work on it.

Thanks,
Yuan



More information about the sheepdog mailing list