[Sheepdog] Segmentation faults and cluster failure
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Tue Sep 20 10:39:59 CEST 2011
At Mon, 19 Sep 2011 11:21:34 -0400,
Shawn Moore wrote:
>
> > I sent a patch to show a correct output of 'collie cluster info'
> > without segfault. Can you try it out?
>
> I went ahead and pulled down "77f26b4" as I was using "3a2801b" for my testing.
>
>
> > From your log messages, it looks like node174 stores a higher epoch.
> > I think if you run a sheep daemon on node174 first, Sheepdog would
> > work again.
>
> I had already tried starting node174 first, but with the new code, at
> least "collie cluster info" doesn't segfault anymore:
> [root at node174 ~]# collie cluster info
> Cluster status: Waiting for other nodes joining
>
> Creation time Epoch Nodes
> 2011-09-15 20:21:18 17 [192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18 16 [192.168.0.157:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18 15 [192.168.0.156:7000, 192.168.0.157:7000,
> 192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18 14 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18 13 [192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18 12 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
> 2011-09-15 20:21:18 11 [192.168.0.156:7000, 192.168.0.157:7000,
> 192.168.0.173:7000, 192.168.0.174:7000]
> 2011-09-15 20:21:18 10 [192.168.0.156:7000, 192.168.0.173:7000,
> 192.168.0.174:7000]
What happens if you add node173 with a new code? What is the result
of 'collie cluster info' on node173?
Thanks,
Kazutaka
More information about the sheepdog
mailing list