<style type="text/css">
<!--
@page { margin: 0.79in }
P { margin-bottom: 0.08in }
A:link { so-language: zxx }
-->
</style>
<p style="margin-bottom: 0in">Dear Kazutaka,</p>
<p style="margin-bottom: 0in"><br>
</p>
<p style="margin-bottom: 0in">Just to let you know that I am working
together with Rubens testing the Sheepdog environment.</p>
<p style="margin-bottom: 0in">We reproduced the issue in which all
nodes of the Sheepdog cluster were crashed when there was an energy
cut off.</p>
<p style="margin-bottom: 0in">The test performed is simple. We
configured two machines running Sheepdog with the newer developer
version (kazum-sheepdog-v0.2.3-35-g31f9a75.tar.gz) available at
<a href="https://github.com/kazum/sheepdog/tree/31f9a75f828634681261144c406eb4ca359dd90c">https://github.com/kazum/sheepdog/tree/31f9a75f828634681261144c406eb4ca359dd90c</a>.
Besides, the Ubuntu Server edition (ubuntu-10.04.3-server-i386.iso)
was installed in the Alice vdi. Fig 1 shows the previous
configuration mentioned. After running the qemu with Alice's OS, we
turned off the two machines at the same time.
</p>
<p style="margin-bottom: 0in"><br>
</p>
<p style="margin-bottom: 0in">The Sheepdog results obtained when we
turned on the machine sheep2 are shown in Fig2. After that, in the
other machine (sheep1), we tried to start the Sheepdog without
success as presented in Fig3.
</p>
<p style="margin-bottom: 0in"><br>
</p>
<p style="margin-bottom: 0in">We performed other test, in which we
shutdown the cluster (both machines); deleted all content of the
Sheepdog storage directory of sheep1 (the one that were running qemu)
machine; turned on the sheepdog in sheep2 and, after it has
recovered, we ran the sheepdog on sheep1. Although the cluster spent
some time performing the synchronization on sheep1, no one machine
was able to start again the OS from Alice as shown in Fig4 since the
Alice's vdi was not available anymore.
</p>
<p style="margin-bottom: 0in"><br>
</p>
<p style="margin-bottom: 0in">Do you have any suggestion about what
may be causing that problem? Besides, I would like to know if the
configuration running the experiment was ok.</p>
<p style="margin-bottom: 0in"><br>
</p>
<p style="margin-bottom: 0in">Best regards,</p>
<p style="margin-bottom: 0in">Gustavo</p>
<br><br><div class="gmail_quote">On Sat, Aug 6, 2011 at 7:00 AM, <span dir="ltr"><<a href="mailto:sheepdog-request@lists.wpkg.org" target="_blank">sheepdog-request@lists.wpkg.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Send sheepdog mailing list submissions to<br>
<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://lists.wpkg.org/mailman/listinfo/sheepdog" target="_blank">http://lists.wpkg.org/mailman/listinfo/sheepdog</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:sheepdog-request@lists.wpkg.org" target="_blank">sheepdog-request@lists.wpkg.org</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:sheepdog-owner@lists.wpkg.org" target="_blank">sheepdog-owner@lists.wpkg.org</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of sheepdog digest..."<br>
<br>
<br>
Today's Topics:<br>
<br>
1. Re: Power supply interruption crashes data stored in sheepdog<br>
(Fernando Frediani (Qube))<br>
2. Re: Power supply interruption crashes data stored in sheepdog<br>
(Rubens Matos)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Fri, 5 Aug 2011 10:52:14 +0000<br>
From: "Fernando Frediani (Qube)" <<a href="mailto:fernando.frediani@qubenet.net" target="_blank">fernando.frediani@qubenet.net</a>><br>
To: 'Rubens Matos' <<a href="mailto:rubens.matos@gmail.com" target="_blank">rubens.matos@gmail.com</a>><br>
Cc: "'<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a>'" <<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a>><br>
Subject: Re: [Sheepdog] Power supply interruption crashes data stored<br>
in sheepdog<br>
Message-ID:<br>
<<a href="mailto:6EC7489C49252F4F823EAE91E3A939391C4F098E@QUBE-TR2-EXC01.qube.qubenet.net" target="_blank">6EC7489C49252F4F823EAE91E3A939391C4F098E@QUBE-TR2-EXC01.qube.qubenet.net</a>><br>
<br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Rubens,<br>
<br>
Do you mean you recovered it ?<br>
What have you do to get it working again ?<br>
<br>
Obrigado<br>
<br>
Fernando<br>
<br>
From: <a href="mailto:sheepdog-bounces@lists.wpkg.org" target="_blank">sheepdog-bounces@lists.wpkg.org</a> [mailto:<a href="mailto:sheepdog-bounces@lists.wpkg.org" target="_blank">sheepdog-bounces@lists.wpkg.org</a>] On Behalf Of Rubens Matos<br>
Sent: 05 August 2011 04:12<br>
To: MORITA Kazutaka<br>
Cc: <a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a><br>
Subject: Re: [Sheepdog] Power supply interruption crashes data stored in sheepdog<br>
<br>
I have already cleaned the damaged cluster. I guess it is possible to reproduce the error, and then capture the output from collie cluster info.<br>
<br>
Anyway, the upcoming "collie cluster check" command is a very good news.<br>
<br>
Rubens de Souza Matos J?nior<br>
<br>
2011/8/4 MORITA Kazutaka <<a href="mailto:morita.kazutaka@lab.ntt.co.jp" target="_blank">morita.kazutaka@lab.ntt.co.jp</a><mailto:<a href="mailto:morita.kazutaka@lab.ntt.co.jp" target="_blank">morita.kazutaka@lab.ntt.co.jp</a>>><br>
At Thu, 4 Aug 2011 16:28:50 -0300,<br>
Rubens Matos wrote:<br>
> Hi everyone,<br>
><br>
> I am testing sheepdog and everything was working, but after an interruption<br>
> in power supply, that affected all nodes, the cluster was damaged so that<br>
> the nodes didn't join again, and I can't recover the data that was stored in<br>
> a VDI.<br>
><br>
> Have you already noticed a similar behavior? Is sheepdog protected against<br>
> such kind of failure, in which all nodes are abruptly disconnected?<br>
Sheepdog should handle the total node failure, but I think some bugs<br>
still exist in it. The error handling has not been tested enough.<br>
<br>
If you have not cleaned the damaged cluster yet, can you give me the<br>
outputs of "collie cluster info" on all the nodes? Those info would<br>
be helpful to find the error reason.<br>
<br>
I'm implementing a "collie cluster check" command, which works like<br>
fsck for Sheepdog. This command would be helpful for recovering the<br>
damaged cluster.<br>
<br>
<br>
Thanks,<br>
<br>
Kazutaka<br>
<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/817a6502/attachment-0001.html" target="_blank">http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/817a6502/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Fri, 5 Aug 2011 09:46:39 -0300<br>
From: Rubens Matos <<a href="mailto:rubens.matos@gmail.com" target="_blank">rubens.matos@gmail.com</a>><br>
To: "Fernando Frediani (Qube)" <<a href="mailto:fernando.frediani@qubenet.net" target="_blank">fernando.frediani@qubenet.net</a>><br>
Cc: "<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a>" <<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a>><br>
Subject: Re: [Sheepdog] Power supply interruption crashes data stored<br>
in sheepdog<br>
Message-ID:<br>
<CAP2mMMntGe1s1Jq5=<a href="mailto:suiyKUS4shruc0Dx61xgWy1ZdGLhY_Qeg@mail.gmail.com" target="_blank">suiyKUS4shruc0Dx61xgWy1ZdGLhY_Qeg@mail.gmail.com</a>><br>
Content-Type: text/plain; charset="iso-8859-1"<br>
<br>
Fernando, I didn't recovered the stored data. I removed the directory and<br>
started sheepdog again.<br>
<br>
Rubens<br>
<br>
<br>
2011/8/5 Fernando Frediani (Qube) <<a href="mailto:fernando.frediani@qubenet.net" target="_blank">fernando.frediani@qubenet.net</a>><br>
<br>
> Rubens,****<br>
><br>
> ** **<br>
><br>
> Do you mean you recovered it ?****<br>
><br>
> What have you do to get it working again ?****<br>
><br>
> ** **<br>
><br>
> Obrigado****<br>
><br>
> ** **<br>
><br>
> Fernando****<br>
><br>
> ** **<br>
><br>
> *From:* <a href="mailto:sheepdog-bounces@lists.wpkg.org" target="_blank">sheepdog-bounces@lists.wpkg.org</a> [mailto:<br>
> <a href="mailto:sheepdog-bounces@lists.wpkg.org" target="_blank">sheepdog-bounces@lists.wpkg.org</a>] *On Behalf Of *Rubens Matos<br>
> *Sent:* 05 August 2011 04:12<br>
> *To:* MORITA Kazutaka<br>
> *Cc:* <a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a><br>
> *Subject:* Re: [Sheepdog] Power supply interruption crashes data stored in<br>
> sheepdog****<br>
><br>
> ** **<br>
><br>
> I have already cleaned the damaged cluster. I guess it is possible to<br>
> reproduce the error, and then capture the output from collie cluster info.<br>
> ****<br>
><br>
> ** **<br>
><br>
> Anyway, the upcoming "collie cluster check" command is a very good news.*<br>
> ***<br>
><br>
><br>
> Rubens de Souza Matos J?nior<br>
><br>
> ****<br>
><br>
> 2011/8/4 MORITA Kazutaka <<a href="mailto:morita.kazutaka@lab.ntt.co.jp" target="_blank">morita.kazutaka@lab.ntt.co.jp</a>>****<br>
><br>
> At Thu, 4 Aug 2011 16:28:50 -0300,****<br>
><br>
> Rubens Matos wrote:<br>
> > Hi everyone,<br>
> ><br>
> > I am testing sheepdog and everything was working, but after an<br>
> interruption<br>
> > in power supply, that affected all nodes, the cluster was damaged so that<br>
> > the nodes didn't join again, and I can't recover the data that was stored<br>
> in<br>
> > a VDI.<br>
> ><br>
> > Have you already noticed a similar behavior? Is sheepdog protected<br>
> against<br>
> > such kind of failure, in which all nodes are abruptly disconnected?****<br>
><br>
> Sheepdog should handle the total node failure, but I think some bugs<br>
> still exist in it. The error handling has not been tested enough.<br>
><br>
> If you have not cleaned the damaged cluster yet, can you give me the<br>
> outputs of "collie cluster info" on all the nodes? Those info would<br>
> be helpful to find the error reason.<br>
><br>
> I'm implementing a "collie cluster check" command, which works like<br>
> fsck for Sheepdog. This command would be helpful for recovering the<br>
> damaged cluster.<br>
><br>
><br>
> Thanks,<br>
><br>
> Kazutaka****<br>
><br>
> ** **<br>
><br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/0734688c/attachment-0001.html" target="_blank">http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/0734688c/attachment-0001.html</a>><br>
<br>
------------------------------<br>
<br>
_______________________________________________<br>
sheepdog mailing list<br>
<a href="mailto:sheepdog@lists.wpkg.org" target="_blank">sheepdog@lists.wpkg.org</a><br>
<a href="http://lists.wpkg.org/mailman/listinfo/sheepdog" target="_blank">http://lists.wpkg.org/mailman/listinfo/sheepdog</a><br>
<br>
<br>
End of sheepdog Digest, Vol 23, Issue 11<br>
****************************************<br>
</blockquote></div><br><br clear="all"><br>-- <br><div>PhD Candidate in Computer Science<br>Federal University of Pernambuco</div><div><a href="http://www.cin.ufpe.br/%7Egrac" target="_blank">http://www.cin.ufpe.br/~grac</a><br>
<a href="http://www.modcs.org/" target="_blank">http://www.modcs.org</a></div><br>