[Sheepdog] unstable behavior when nodes join or leave

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Sat Sep 3 05:31:51 CEST 2011


At Fri, 2 Sep 2011 14:33:42 +0200,
Valerio Pachera wrote:
> 
> 2011/8/30 Keiichi SHIMA <shima at wide.ad.jp>:
> > 5. at some point, the sheep cluster stop working.
> >  'collie vdi list' is start showing errors ('failed to read a inode header ...')
> >  'collie node info' is start showing errors (the same message as above)
> 
> Few days ago I noticed that problem too
> 
> # collie vdi list
>   name        id    size    used  shared    creation time   vdi id
> ------------------------------------------------------------------
> failed to read a inode header 10701927, 0, 42
> 
> On a small 3 noce cluster.
> A vdi disk has been created when all 3 nodes were on.
> I shuted off node3, then I got the error.
> Anyway,the cluster didn't stop and completed the sync. The error
> message was then not shown anymore.
> 
> I tryed to shutoff node3 again and this time, no error message.
> 
> I download the latest sheepdog and see if I get any error message
> agging/removing nodes.
> 
> > What is the right way to remove a node from a cluster?
> I'm wondering the same.
> What happens if I kill all sheep processes but corosync (and wait
> several minutes or even hours)? And viceversa?

The right way to remove a node is just killing the sheep daemon.  You
don't need to stop corosync.  You can start the sheep daemon and add
the node again without stopping corosync.

If you stop corosync without killing sheeps, the sheep daemons on the
node will stop automatically.

Thanks,

Kazutaka

> 
> Probably, the best way is to kill all sheep processes and corosync a
> soon as possible.
>   pkill sheep; /etc/init.d/corosync stop
> -- 
> sheepdog mailing list
> sheepdog at lists.wpkg.org
> http://lists.wpkg.org/mailman/listinfo/sheepdog



More information about the sheepdog mailing list