[Sheepdog] [ANNOUNCE] Sheepdog 0.2.0 released

Tue Jan 11 08:36:18 CET 2011

At Fri, 07 Jan 2011 17:11:32 -0800,
Serge Leschinsky wrote:
> 
> On 01/05/2011 04:38 AM, MORITA Kazutaka wrote:
> >> qemu-img convert  /tmp/ttt.img sheepdog:192.168.177.35:7000:tutu
> >> failed connect to localhost:7000
> >>
> >> qemu-img: sheepdog:192.168.177.35:7000:tutu: error while converting raw:
> >> Input/output error
> >>
> >
> > Could you try the following patch?
> >    http://lists.wpkg.org/pipermail/sheepdog/2010-October/000713.html
> >
> It work correctly now, thank you!
> 
> May I ask one more question? Is it mandatory to shutdown cluster (collie cluster 
> shutdown) before all nodes are going to be powered off?

Yes, but you can fix the cluster manually even when you wrongly power
off all nodes before running 'collie cluster shutdown'.

> 
> I had a problem with my cluster after all nodes were shutdowned (correctly, of 
> course). When I brought them up, I got
> 
> root at dl1:~ # collie node list
> The node had failed to join sheepdog
> failed to get node list
> 
> on all (dl1, dl2, dl3) nodes. After the cluster recreation everything seems to 
> be ok.

In that case,
  1. run the following commands
       $ collie cluster info -a d1
       $ collie cluster info -a d2
       $ collie cluster info -a d3
     and check which node has the largest epoch number.

  2. kill all sheep daemons on d1, d2, and d3

  3. start a sheep daemon on the node which has the largest epoch
  number.

  4. start sheep daemons on other nodes

I think of adding 'collie cluster check' to check and fix the sheepdog
cluster state.

Thanks,

Kazutaka