[Sheepdog] unstable behavior when nodes join or leave

Fri Sep 2 14:33:42 CEST 2011

2011/8/30 Keiichi SHIMA <shima at wide.ad.jp>:
> 5. at some point, the sheep cluster stop working.
>  'collie vdi list' is start showing errors ('failed to read a inode header ...')
>  'collie node info' is start showing errors (the same message as above)

Few days ago I noticed that problem too

# collie vdi list
  name        id    size    used  shared    creation time   vdi id
------------------------------------------------------------------
failed to read a inode header 10701927, 0, 42

On a small 3 noce cluster.
A vdi disk has been created when all 3 nodes were on.
I shuted off node3, then I got the error.
Anyway,the cluster didn't stop and completed the sync. The error
message was then not shown anymore.

I tryed to shutoff node3 again and this time, no error message.

I download the latest sheepdog and see if I get any error message
agging/removing nodes.

> What is the right way to remove a node from a cluster?
I'm wondering the same.
What happens if I kill all sheep processes but corosync (and wait
several minutes or even hours)? And viceversa?

Probably, the best way is to kill all sheep processes and corosync a
soon as possible.
  pkill sheep; /etc/init.d/corosync stop