[sheepdog] [PATCH 0/9] allow new nodes joining before all old sheep are up

Liu Yuan namei.unix at gmail.com
Wed Jun 13 13:36:03 CEST 2012


On 06/13/2012 07:27 PM, Christoph Hellwig wrote:
> On Wed, Jun 13, 2012 at 10:42:03AM +0800, Liu Yuan wrote:
>> On 06/12/2012 11:09 PM, Christoph Hellwig wrote:
>>> This series allows to start new sheep when the cluster is in WAIT_FOR_JOIN
>>> status.  They still don't count towards the number of required old sheep,
>>> but will join the cluster as soon as old sheep have been seen.
>>>
>>
>> I am not against the patch set per se (it does many more cleanups than
>> the title advertise), but I just wonder is there any real use case for
>> adding new sheep when we are recovering the crashed cluster?
> 
> The prime use case is to speed up recovering a cluster in that case.
> Right now it requires manual intervention, which usually means some
> sort of periodic restart.  Handling it correctly from the start not
> only generally makes this case faster, it also simplifies the management
> layer a bit.
> 

I haven't tried to run the patch yet, so you mean we don't need restart
the nodes (which are included to old configuration) failed to join, say,
because of mismatched epoch, but they don't leave with this patch set.
And when all the nodes join, they will get a consensus on highest epoch
and begin running again? If so, looks much better than old join & exit
process approach for crashed cluster. Then 'new sheep' in the intro is
kind of misleading.

Thanks,
Yuan




More information about the sheepdog mailing list