[sheepdog] [PATCH v3] sheep: remove master node

Wed Jul 24 13:37:53 CEST 2013

On Jul 24, 2013, at 5:40 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:

> At Wed, 24 Jul 2013 17:20:51 +0800,
> Kai Zhang wrote:
>> 
>> 
>> On Jul 24, 2013, at 5:13 PM, Kai Zhang <kyle at zelin.io> wrote:
>> 
>>> 
>>> On Jul 24, 2013, at 2:53 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:
>>> 
>>>> At Tue, 23 Jul 2013 17:30:03 +0800,
>>>> Kai Zhang wrote:
>>>>> 
>>>>> On Jul 23, 2013, at 4:44 PM, MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> wrote:
>>>>> 
>>>>>> Ah, sorry.  The node A doesn't start until the nodes B, C, and D come
>>>>>> back.  It is because the latest epoch in the node A includes B, C, and
>>>>>> D. senerio
>>>>> 
>>>>> Well, it seems I didn't fully understand the current implementation of cluster driver.
>>>>> 
>>>>> A very silly question: if B, C come back but D does not, what is the status of 
>>>>> the cluster? It can work or just wait for D?
>>>> 
>>>> The cluster status will be SD_STATUS_WAIT.  It will wait for the node
>>>> D to join Sheepdog if you don't run "collie cluster recover force".
>>>> 
>>> 
>>> Does this mean that sheepdog is not self-healing?
>>> Any persistent failure of sheep will be handled by administrator?
>> 
>> Sorry, my description is not correct.
>> What I mean is that sheepdog cluster cannot recover by themselves at this scenario.
>> And I'm a little disappointed with this.
>> Is there a possibility to solve this?
> 
> If the number of redundacy is 1, it is possible that only the node D
> has the latest data.  Then, it's not safe to start sheepdog
> automatically without the node D.
> 

If the number of concurrent lost sheep is larger than the number of replica, it is sure that data will be lost.
I think it is reasonable and acceptable. And we have no choice but increase the number of replica.

If we have to start cluster manually, this will sacrifice availability. 

Another scenario in my mind is, if we shutdown all sheep, and then restart it. We have to bring back all of them,
otherwise the cluster will not work.

> Sheepdog starts if all the nodes in the previous epoch are gathered -
> this is necessary to keep strong consistency which is required for
> block storage system.  We can relax this rule a bit (e.g. it is okay
> to start sheepdog in the above example if the number of redundancy is
> larger than one).  It's on my TODO items.
> 
> 

Based on my above statement, there is still risk of losing data.
We cannot avoid it totally.

BTW, I think the number of redundancy should be bound to a specific vdi, but not a cluster.

Thanks,
Kyle