On Tue, May 29, 2012 at 12:04 AM, Liu Yuan <namei.unix at gmail.com> wrote: > On 05/28/2012 11:54 PM, Yunkai Zhang wrote: > >>> I got a bug report where the nr_sd_nodes == nr_zk_nodes assert in >>> > build_node_list is trigger by a larger number of sheep joining at the same >>> > time. >> We should not start sheeps at the same time. Are you read this commit log from >> this patch:8567aae281c75502c0a267bf76b771a2af8392f2 ? > > > Does Christoph's patch remove this constraint? We really should remove No! If we start sheeps at the some time, it may cause another problem: some sheeps will get a incomplete member-list that this patch can't fix. > this constraint, it is kind of a bug. I must say again, this is not a *bug*. We should know the *essential* difference between corosync and zookeeper: 1) Corosync will push member list to each sheep when it joins into cluster. 2) But zookeeper-server would not do this, the joining sheep should fetch member list on its own. For this difference, when we use zookeeper driver, we should face a new problem: How to package these two steps: a. fetch member list from zookeeper-server, b. update member list in zookeeper-server (add itself to the member list) into one transaction? There is *not* way to fix this problem if we do not use lock but also star sheeps at the same time. In fact, we *just* need to start _the first_ sheep separately, after that, we can start other sheeps *concurrently*. That is say, after we start the first sheep, this problem is not exist! At 99.9% time, it will not bother me. If you want to fix this problem completely, the only one method is to hack zookeeper-server's code, but I don't think this is worth to do. > > Also, zookeeper has a very hideous defect that it needs a very long > window (currently 30 seconds) to detect failed nodes. This would be > catastrophic if the IO are routed to those failed nodes which sheep > think of still alive. The fix I can think of is to add a active > notification to the cluster when any sheep get a confuse fused error, > plus current passive membership detection. I plan to fix this problem. Maybe the best way it to turn down the SESSION_TIMEOUT's value. > > Thanks, > Yuan -- Yunkai Zhang Work at Taobao |