[Sheepdog] sheepdog Digest, Vol 23, Issue 11

Gustavo Callou grac at cin.ufpe.br
Wed Aug 10 21:16:51 CEST 2011


Dear Kazutaka,


 Just to let you know that I am working together with Rubens testing the
Sheepdog environment.

We reproduced the issue in which all nodes of the Sheepdog cluster were
crashed when there was an energy cut off.

The test performed is simple. We configured two machines running Sheepdog
with the newer developer version (kazum-sheepdog-v0.2.3-35-g31f9a75.tar.gz)
available at
https://github.com/kazum/sheepdog/tree/31f9a75f828634681261144c406eb4ca359dd90c.
Besides, the Ubuntu Server edition (ubuntu-10.04.3-server-i386.iso) was
installed in the Alice vdi. Fig 1 shows the previous configuration
mentioned. After running the qemu with Alice's OS, we turned off the two
machines at the same time.


 The Sheepdog results obtained when we turned on the machine sheep2 are
shown in Fig2. After that, in the other machine (sheep1), we tried to start
the Sheepdog without success as presented in Fig3.


 We performed other test, in which we shutdown the cluster (both machines);
deleted all content of the Sheepdog storage directory of sheep1 (the one
that were running qemu) machine; turned on the sheepdog in sheep2 and, after
it has recovered, we ran the sheepdog on sheep1. Although the cluster spent
some time performing the synchronization on sheep1, no one machine was able
to start again the OS from Alice as shown in Fig4 since the Alice's vdi was
not available anymore.


 Do you have any suggestion about what may be causing that problem? Besides,
I would like to know if the configuration running the experiment was ok.


 Best regards,

Gustavo


On Sat, Aug 6, 2011 at 7:00 AM, <sheepdog-request at lists.wpkg.org> wrote:

> Send sheepdog mailing list submissions to
>        sheepdog at lists.wpkg.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://lists.wpkg.org/mailman/listinfo/sheepdog
> or, via email, send a message with subject or body 'help' to
>        sheepdog-request at lists.wpkg.org
>
> You can reach the person managing the list at
>        sheepdog-owner at lists.wpkg.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of sheepdog digest..."
>
>
> Today's Topics:
>
>   1. Re: Power supply interruption crashes data stored in      sheepdog
>      (Fernando Frediani (Qube))
>   2. Re: Power supply interruption crashes data stored in      sheepdog
>      (Rubens Matos)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 5 Aug 2011 10:52:14 +0000
> From: "Fernando Frediani (Qube)" <fernando.frediani at qubenet.net>
> To: 'Rubens Matos' <rubens.matos at gmail.com>
> Cc: "'sheepdog at lists.wpkg.org'" <sheepdog at lists.wpkg.org>
> Subject: Re: [Sheepdog] Power supply interruption crashes data stored
>        in      sheepdog
> Message-ID:
>        <
> 6EC7489C49252F4F823EAE91E3A939391C4F098E at QUBE-TR2-EXC01.qube.qubenet.net>
>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Rubens,
>
> Do you mean you recovered it ?
> What have you do to get it working again ?
>
> Obrigado
>
> Fernando
>
> From: sheepdog-bounces at lists.wpkg.org [mailto:
> sheepdog-bounces at lists.wpkg.org] On Behalf Of Rubens Matos
> Sent: 05 August 2011 04:12
> To: MORITA Kazutaka
> Cc: sheepdog at lists.wpkg.org
> Subject: Re: [Sheepdog] Power supply interruption crashes data stored in
> sheepdog
>
> I have already cleaned the damaged cluster. I guess it is possible to
> reproduce the error, and then capture the output from collie cluster info.
>
> Anyway, the upcoming  "collie cluster check" command is a very good news.
>
> Rubens de Souza Matos J?nior
>
> 2011/8/4 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp<mailto:
> morita.kazutaka at lab.ntt.co.jp>>
> At Thu, 4 Aug 2011 16:28:50 -0300,
> Rubens Matos wrote:
> > Hi everyone,
> >
> > I am testing sheepdog and everything was working, but after an
> interruption
> > in power supply, that affected all nodes, the cluster was damaged so that
> > the nodes didn't join again, and I can't recover the data that was stored
> in
> > a VDI.
> >
> > Have you already noticed a similar behavior? Is sheepdog protected
> against
> > such kind of failure, in which all nodes are abruptly disconnected?
> Sheepdog should handle the total node failure, but I think some bugs
> still exist in it.  The error handling has not been tested enough.
>
> If you have not cleaned the damaged cluster yet, can you give me the
> outputs of "collie cluster info" on all the nodes?  Those info would
> be helpful to find the error reason.
>
> I'm implementing a "collie cluster check" command, which works like
> fsck for Sheepdog.  This command would be helpful for recovering the
> damaged cluster.
>
>
> Thanks,
>
> Kazutaka
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/817a6502/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Fri, 5 Aug 2011 09:46:39 -0300
> From: Rubens Matos <rubens.matos at gmail.com>
> To: "Fernando Frediani (Qube)" <fernando.frediani at qubenet.net>
> Cc: "sheepdog at lists.wpkg.org" <sheepdog at lists.wpkg.org>
> Subject: Re: [Sheepdog] Power supply interruption crashes data stored
>        in      sheepdog
> Message-ID:
>        <CAP2mMMntGe1s1Jq5=suiyKUS4shruc0Dx61xgWy1ZdGLhY_Qeg at mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Fernando, I didn't recovered the stored data. I removed the directory and
> started sheepdog again.
>
> Rubens
>
>
> 2011/8/5 Fernando Frediani (Qube) <fernando.frediani at qubenet.net>
>
> >  Rubens,****
> >
> > ** **
> >
> > Do you mean you recovered it ?****
> >
> > What have you do to get it working again ?****
> >
> > ** **
> >
> > Obrigado****
> >
> > ** **
> >
> > Fernando****
> >
> > ** **
> >
> > *From:* sheepdog-bounces at lists.wpkg.org [mailto:
> > sheepdog-bounces at lists.wpkg.org] *On Behalf Of *Rubens Matos
> > *Sent:* 05 August 2011 04:12
> > *To:* MORITA Kazutaka
> > *Cc:* sheepdog at lists.wpkg.org
> > *Subject:* Re: [Sheepdog] Power supply interruption crashes data stored
> in
> > sheepdog****
> >
> > ** **
> >
> > I have already cleaned the damaged cluster. I guess it is possible to
> > reproduce the error, and then capture the output from collie cluster
> info.
> > ****
> >
> > ** **
> >
> > Anyway, the upcoming  "collie cluster check" command is a very good
> news.*
> > ***
> >
> >
> > Rubens de Souza Matos J?nior
> >
> > ****
> >
> > 2011/8/4 MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp>****
> >
> > At Thu, 4 Aug 2011 16:28:50 -0300,****
> >
> > Rubens Matos wrote:
> > > Hi everyone,
> > >
> > > I am testing sheepdog and everything was working, but after an
> > interruption
> > > in power supply, that affected all nodes, the cluster was damaged so
> that
> > > the nodes didn't join again, and I can't recover the data that was
> stored
> > in
> > > a VDI.
> > >
> > > Have you already noticed a similar behavior? Is sheepdog protected
> > against
> > > such kind of failure, in which all nodes are abruptly disconnected?****
> >
> > Sheepdog should handle the total node failure, but I think some bugs
> > still exist in it.  The error handling has not been tested enough.
> >
> > If you have not cleaned the damaged cluster yet, can you give me the
> > outputs of "collie cluster info" on all the nodes?  Those info would
> > be helpful to find the error reason.
> >
> > I'm implementing a "collie cluster check" command, which works like
> > fsck for Sheepdog.  This command would be helpful for recovering the
> > damaged cluster.
> >
> >
> > Thanks,
> >
> > Kazutaka****
> >
> > ** **
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.wpkg.org/pipermail/sheepdog/attachments/20110805/0734688c/attachment-0001.html
> >
>
> ------------------------------
>
> _______________________________________________
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog
>
>
> End of sheepdog Digest, Vol 23, Issue 11
> ****************************************
>



-- 
PhD Candidate in Computer Science
Federal University of Pernambuco
http://www.cin.ufpe.br/~grac
http://www.modcs.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110810/a4a3bb2f/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Figures.zip
Type: application/zip
Size: 401084 bytes
Desc: not available
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20110810/a4a3bb2f/attachment-0002.zip>


More information about the sheepdog mailing list