[Sheepdog] Segmentation faults and cluster failure

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Tue Sep 20 10:39:59 CEST 2011


At Mon, 19 Sep 2011 11:21:34 -0400,
Shawn Moore wrote:
> 
> > I sent a patch to show a correct output of 'collie cluster info'
> > without segfault.  Can you try it out?
> 
> I went ahead and pulled down "77f26b4" as I was using "3a2801b" for my testing.
> 
> 
> > From your log messages, it looks like node174 stores a higher epoch.
> > I think if you run a sheep daemon on node174 first, Sheepdog would
> > work again.
> 
> I had already tried starting node174 first, but with the new code, at
> least "collie cluster info" doesn't segfault anymore:
> [root at node174 ~]# collie cluster info
> Cluster status: Waiting for other nodes joining
> 
> Creation time        Epoch Nodes
> 2011-09-15 20:21:18     17 [192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18     16 [192.168.0.157:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18     15 [192.168.0.156:7000, 192.168.0.157:7000,
> 192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18     14 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18     13 [192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18     12 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18     11 [192.168.0.156:7000, 192.168.0.157:7000,
> 192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18     10 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]

What happens if you add node173 with a new code?  What is the result
of 'collie cluster info' on node173?

Thanks,

Kazutaka




More information about the sheepdog mailing list