[sheepdog-users] is --nohalt dangerous?
namei.unix at gmail.com
Thu Jul 19 04:09:58 CEST 2012
On 07/19/2012 02:14 AM, Arnold Krille wrote:
> But you do get problems when you write to the last remaining node, that node
> dies (non-recoverable) and you bring back the other nodes. These node don't
> have a chance of knowing they have invalid data. Well they can know, because
> they might be shut down uncleanly. But then the remaining nodes know that they
> have invalid data, so what? You can't go on with that and have to bring in the
> backup you don't have...
This is exactly why halt behavior is default one. Without -nohalt, we
don't have this problem.
> For data consistency it would have been better if the cluster stopped writing
> after more then half of the copies died. And thus forced the admins to fix the
> nodes well before that even occures.
> Setting a copy-value of more then one probably meant something for the admin
> regarding data-security. So its safe to assume that he wants to protect
> himself against the scenario of the last node dying with the last consistent
> data on it.
> So, please give sheepdog real quorum calculation when there are more then two
> copies wanted.
Quorum will fail the case if the majority nodes are down at the same
time and non-recoverable, in this case, we lose the updates.
We actually have a more stronger constraint: if nr_nodes < copies, we
halt the cluster. I think this is the safest choose.
More information about the sheepdog-users