<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">2014-06-18 17:49 GMT+02:00 Andrew J. Hobbs <span dir="ltr"><<a href="mailto:ajhobbs@desu.edu" target="_blank">ajhobbs@desu.edu</a>></span>:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Perhaps I'm misunderstanding your scenario.<br>

<br>

You had a cluster running sheepdog md using -c 2.  A device failure on one node triggered a node level recovery of lost blocks.  However, your cluster was over-committed, resulting in the node level recovery overflowing the available space left on the node.  During this process, another node failed in similar fashion.  Sheepdog was unable to recover, but you were able to manually recover the vdis, however their content was corrupt?<br>


<br>

Is that a valid summary of what happened in your situation?<br></blockquote><div><br></div><div>Exactly :-) <br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


If so, then I'd argue the corruption stems from not properly dealing with the over-committed state, which is where the corruption likely originated from.  So reasonable to say that perhaps the proper solution would be during node recovery, determine if the committed space on that node will exceed the available space left on devices, and only then trigger a cluster re-weight procedure, which is quite expensive.<br>

</blockquote><div><br></div><div>Yes, expensive but better than disk full and the consequences.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Note there's still a window for data loss as if a missing piece is on another node's device which has also coincidentally failed, then there's no location to pull that block from and the cluster should probably halt.<br>

</blockquote><div><br></div><div>Agreed!<br></div><div>The loss of a single disk is as dangerous as the loss of a whole node.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>

There's never a way to make this sort of thing invulnerable.  There's always some corner case where data loss or corruption is possible.<br></blockquote><div><br></div><div>Yep, but I think in my case, if the cluster was going to halt when the second disk broke down, I'm pretty sure the content of the vdi was not going to be corrupted.<br>

</div>So I think we can improve this.<br><br></div><div class="gmail_quote">It's matter of halting the cluster for manual intervention in extreme cases it can't handle itself.<br></div><div class="gmail_quote"><br>

</div></div></div>