[Sheepdog] Drive Failure

Sun Apr 24 10:56:59 CEST 2011

At Sun, 24 Apr 2011 01:14:36 -0500,
Greg Zapp wrote:
> 
> If the sheep daemon were killed would VM's on node A keep running?  That's the desired outcome.  I dont want a single storage error to cause any VMs to go down...

That's the final goal though not supported yet.  The ideal behavior is
that when the sheep on node A is killed, the VM on node A reconnects
to the sheep on node B.  In my plan, it will be done in the version
0.4.0.
  https://sourceforge.net/apps/trac/sheepdog/ticket/1

> 
> The vdi is found but the vm hangs during boot.

Thank you.  Currently, Sheepdog I/O is a bit unstable while node
membership changes, and I guess it caused data loss of your VM.  I'll
fix them soon.

Thanks,

Kazutaka

> 
> On Apr 23, 2011, at 11:54 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
> 
> > Hi,
> > 
> > Thanks for your feedbacks.
> > 
> > At Sat, 23 Apr 2011 20:17:33 -0500,
> > Greg Zapp wrote:
> >> I have sheepdog running on two nodes.  The sheepdog store is a single drive
> >> separate from the OS drive.  I'm running an ubuntu VM on node A.  If I force
> >> unmount the store, the VM craps itself.  Shouldn't sheepdog be able to
> >> gracefully handle a drive failure or unmounted file system?
> > 
> > Yes, it really should be handled...  In this case, Sheepdog must do
> > the below automatically:
> > 
> >  - kill the sheep daemon on node A when the store is unavailable
> >  - failover the VM connection to node B
> > 
> > In particular, the first needs to be resolved as soon as possible
> > because one disk failure shouldn't affect the availability of total
> > system.
> > 
> >> 
> >> Update:
> >> 
> >> In order to bring the VM up on node B, I had to kill sheep on node A.
> >> However, when I remounted the drive and brought sheep back up on node A it
> >> ruined everything.  I shut down sheep on node A and cleared the store
> >> directory then brought it back up.  After it synced I was still unable to
> >> boot the VM on node A or node B.  It's hosed.
> > 
> > This should be handled in the current version and seems to be a bug.
> > Unfortunately, I couldn't reproduce the problem in my environment.
> > Your VM couldn't even find the sheepdog volume?  Or your VM found the
> > volume but hung during boot?
> > 
> > 
> > Thanks,
> > 
> > Kazutaka