I have already cleaned the damaged cluster. I guess it is possible to reproduce the error, and then capture the output from collie cluster info.<div><br></div><div>Anyway, the upcoming <span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; color: rgb(51, 51, 51); "> </span>"collie cluster check" command is a very good news.</div>


<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br clear="all">Rubens de Souza Matos Júnior<br>

<br><br><div class="gmail_quote">2011/8/4 MORITA Kazutaka <span dir="ltr"><<a href="mailto:morita.kazutaka@lab.ntt.co.jp">morita.kazutaka@lab.ntt.co.jp</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


At Thu, 4 Aug 2011 16:28:50 -0300,<br>

<div class="im">Rubens Matos wrote:<br>

> Hi everyone,<br>

><br>

> I am testing sheepdog and everything was working, but after an interruption<br>

> in power supply, that affected all nodes, the cluster was damaged so that<br>

> the nodes didn't join again, and I can't recover the data that was stored in<br>

> a VDI.<br>

><br>

> Have you already noticed a similar behavior? Is sheepdog protected against<br>

> such kind of failure, in which all nodes are abruptly disconnected?<br>

<br>

</div>Sheepdog should handle the total node failure, but I think some bugs<br>

still exist in it.  The error handling has not been tested enough.<br>

<br>

If you have not cleaned the damaged cluster yet, can you give me the<br>

outputs of "collie cluster info" on all the nodes?  Those info would<br>

be helpful to find the error reason.<br>

<br>

I'm implementing a "collie cluster check" command, which works like<br>

fsck for Sheepdog.  This command would be helpful for recovering the<br>

damaged cluster.<br>

<br>

<br>

Thanks,<br>

<font color="#888888"><br>

Kazutaka<br>

</font></blockquote></div><br></div>