[sheepdog] [PATCH 0/3] zookeeper: fix error handling

Liu Yuan namei.unix at gmail.com
Thu May 30 15:53:16 CEST 2013


On 05/30/2013 09:46 PM, MORITA Kazutaka wrote:
> At Thu, 30 May 2013 21:26:26 +0800,
> Liu Yuan wrote:
>>
>> On 05/30/2013 09:16 PM, MORITA Kazutaka wrote:
>>> Although ZooKeeper easily causes a timeout, the problem is not
>>> specific to the zookeeper driver.  I think this should be fixed in
>>> sheep/group.c.
>>>
>>
>> Compared to corosync, local driver, zk is the only one having the
>> concept of timeout, no? With this in mind, I think put timeout problem
>> into sheep just complicate the sheep code. I don't get why timeout
>> should be handled in a upper layer in sheep, what is the benefit?
> 
> Corosync with many nodes also has the similar problem (temporary split
> brain), and I believe we will suffer from the same problem with
> shepherd.  Of course, I don't intend to introduce a complex code.  If
> my implementation become complex, I'll give up the idea.
> 

Well, network partition looks to me different problem than timeout
problem. Suppose we have network split into A and B, then all nodes in A
and B are going to try to rejoin each other's partition?

Thanks,
Yuan




More information about the sheepdog mailing list