On 07/19/2012 02:14 AM, Arnold Krille wrote: > But you do get problems when you write to the last remaining node, that node > dies (non-recoverable) and you bring back the other nodes. These node don't > have a chance of knowing they have invalid data. Well they can know, because > they might be shut down uncleanly. But then the remaining nodes know that they > have invalid data, so what? You can't go on with that and have to bring in the > backup you don't have... This is exactly why halt behavior is default one. Without -nohalt, we don't have this problem. > For data consistency it would have been better if the cluster stopped writing > after more then half of the copies died. And thus forced the admins to fix the > nodes well before that even occures. > > Setting a copy-value of more then one probably meant something for the admin > regarding data-security. So its safe to assume that he wants to protect > himself against the scenario of the last node dying with the last consistent > data on it. > > So, please give sheepdog real quorum calculation when there are more then two > copies wanted. Quorum will fail the case if the majority nodes are down at the same time and non-recoverable, in this case, we lose the updates. We actually have a more stronger constraint: if nr_nodes < copies, we halt the cluster. I think this is the safest choose. Thanks, Yuan |