[Sheepdog] Drive Failure

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Sat May 14 09:06:09 CEST 2011


At Sun, 24 Apr 2011 17:56:59 +0900,
MORITA Kazutaka wrote:
> 
> At Sun, 24 Apr 2011 01:14:36 -0500,
> Greg Zapp wrote:
> > 
> > If the sheep daemon were killed would VM's on node A keep running?  That's the desired outcome.  I dont want a single storage error to cause any VMs to go down...
> 
> That's the final goal though not supported yet.  The ideal behavior is
> that when the sheep on node A is killed, the VM on node A reconnects
> to the sheep on node B.  In my plan, it will be done in the version
> 0.4.0.
>   https://sourceforge.net/apps/trac/sheepdog/ticket/1

On second thought, we don't need reconnection in this case.
What we need to do here is:
 - remove node A from a consistent hash ring
 - keep node A as a gateway node

I've posted a patch to support this, and it will be included in the
next version, 0.2.3.

Thanks,

Kazutaka

> 
> > 
> > The vdi is found but the vm hangs during boot.
> 
> Thank you.  Currently, Sheepdog I/O is a bit unstable while node
> membership changes, and I guess it caused data loss of your VM.  I'll
> fix them soon.
> 
> Thanks,
> 
> Kazutaka
> 
> 
> > 
> > On Apr 23, 2011, at 11:54 PM, MORITA Kazutaka <morita.kazutaka at gmail.com> wrote:
> > 
> > > Hi,
> > > 
> > > Thanks for your feedbacks.
> > > 
> > > At Sat, 23 Apr 2011 20:17:33 -0500,
> > > Greg Zapp wrote:
> > >> I have sheepdog running on two nodes.  The sheepdog store is a single drive
> > >> separate from the OS drive.  I'm running an ubuntu VM on node A.  If I force
> > >> unmount the store, the VM craps itself.  Shouldn't sheepdog be able to
> > >> gracefully handle a drive failure or unmounted file system?
> > > 
> > > Yes, it really should be handled...  In this case, Sheepdog must do
> > > the below automatically:
> > > 
> > >  - kill the sheep daemon on node A when the store is unavailable
> > >  - failover the VM connection to node B
> > > 
> > > In particular, the first needs to be resolved as soon as possible
> > > because one disk failure shouldn't affect the availability of total
> > > system.
> > > 
> > >> 
> > >> Update:
> > >> 
> > >> In order to bring the VM up on node B, I had to kill sheep on node A.
> > >> However, when I remounted the drive and brought sheep back up on node A it
> > >> ruined everything.  I shut down sheep on node A and cleared the store
> > >> directory then brought it back up.  After it synced I was still unable to
> > >> boot the VM on node A or node B.  It's hosed.
> > > 
> > > This should be handled in the current version and seems to be a bug.
> > > Unfortunately, I couldn't reproduce the problem in my environment.
> > > Your VM couldn't even find the sheepdog volume?  Or your VM found the
> > > volume but hung during boot?
> > > 
> > > 
> > > Thanks,
> > > 
> > > Kazutaka



More information about the sheepdog mailing list