[sheepdog-users] Cluster recover after loosing 2 devices

Wed Jun 18 10:48:12 CEST 2014

2014-06-17 18:34 GMT+02:00 Andrew J. Hobbs <ajhobbs at desu.edu>:

> In your case, with 4 servers and a '-c 2', you can survive a single
> failure _at a time_ down

...

Due to zoning, no machine has more than a single copy of any particular vdi
> block

....
>
No more so than Raid 6 makes you immune to drive loss, simply immune to up
> to 2 failed drives.
>

Hi Andrew, thank you for your answer.
I already know these concepts.
In my first mail I wrote
"My redundancy schema is -c 2 so, in any case, it wasn't possible to keep
cluster consistency."

What I'm focusing on, is that after I've been able to _restore all cluster
objects_ (so all vdi), their content was ruined.
More specifically, the content of the vdi that were used by running guests
during the "crash".

My case couldn't be managed by sheepdog automatically, that's clear.
But it was possible to restore the cluster after "manual" intervention.

I use the analogy of raid 5 with 3 disks for simplicity (mdadm).
I tried to simulate a similar scenario:
- removed a disk.
  (nothing happens: no rebuild because now it behaves like a raid 0)
- removed the second disk
  Raid goes in failed status and the *filesystem in read-only*.
  Once plugged back the second disk, by a specific procedure, I'm able to
re-assemble the failed raid.
  The filesystem is still consistent.

This is an extreme case that can't be managed automatically by mdadm
obviously.
It would be bad if, after re-assembling the array, the file system was
corrupted.

Sheepdog should behave in analogous way.
It already does in case simultaneous nodes loss.
It doesn't in case of simultaneous device loss (on different nodes).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog-users/attachments/20140618/45896ceb/attachment-0005.html>