[Sheepdog] Drive Failure
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Sat May 14 09:06:09 CEST 2011
At Sun, 24 Apr 2011 17:56:59 +0900,
MORITA Kazutaka wrote:
>
> At Sun, 24 Apr 2011 01:14:36 -0500,
> Greg Zapp wrote:
> >
> > If the sheep daemon were killed would VM's on node A keep running? That's the desired outcome. I dont want a single storage error to cause any VMs to go down...
>
> That's the final goal though not supported yet. The ideal behavior is
> that when the sheep on node A is killed, the VM on node A reconnects
> to the sheep on node B. In my plan, it will be done in the version
> 0.4.0.
> https://sourceforge.net/apps/trac/sheepdog/ticket/1
On second thought, we don't need reconnection in this case.
What we need to do here is:
- remove node A from a consistent hash ring
- keep node A as a gateway node
I've posted a patch to support this, and it will be included in the
next version, 0.2.3.
Thanks,
Kazutaka
>
> >
> > The vdi is found but the vm hangs during boot.
>
> Thank you. Currently, Sheepdog I/O is a bit unstable while node
> membership changes, and I guess it caused data loss of your VM. I'll
> fix them soon.
>
> Thanks,
>
> Kazutaka
>
>
> >
> > On Apr 23, 2011, at 11:54 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Thanks for your feedbacks.
> > >
> > > At Sat, 23 Apr 2011 20:17:33 -0500,
> > > Greg Zapp wrote:
> > >> I have sheepdog running on two nodes. The sheepdog store is a single drive
> > >> separate from the OS drive. I'm running an ubuntu VM on node A. If I force
> > >> unmount the store, the VM craps itself. Shouldn't sheepdog be able to
> > >> gracefully handle a drive failure or unmounted file system?
> > >
> > > Yes, it really should be handled... In this case, Sheepdog must do
> > > the below automatically:
> > >
> > > - kill the sheep daemon on node A when the store is unavailable
> > > - failover the VM connection to node B
> > >
> > > In particular, the first needs to be resolved as soon as possible
> > > because one disk failure shouldn't affect the availability of total
> > > system.
> > >
> > >>
> > >> Update:
> > >>
> > >> In order to bring the VM up on node B, I had to kill sheep on node A.
> > >> However, when I remounted the drive and brought sheep back up on node A it
> > >> ruined everything. I shut down sheep on node A and cleared the store
> > >> directory then brought it back up. After it synced I was still unable to
> > >> boot the VM on node A or node B. It's hosed.
> > >
> > > This should be handled in the current version and seems to be a bug.
> > > Unfortunately, I couldn't reproduce the problem in my environment.
> > > Your VM couldn't even find the sheepdog volume? Or your VM found the
> > > volume but hung during boot?
> > >
> > >
> > > Thanks,
> > >
> > > Kazutaka
More information about the sheepdog
mailing list