[Sheepdog] Segmentation faults and cluster failure
MORITA Kazutaka
morita.kazutaka at lab.ntt.co.jp
Sat Sep 17 09:51:55 CEST 2011
At Fri, 16 Sep 2011 10:25:19 -0400,
Shawn Moore wrote:
>
> I will apologize in advance for this really long post, but I like to
> provide as much data as I can in advance.
Thanks for your report. It was really helpful. :)
I sent a patch to show a correct output of 'collie cluster info'
without segfault. Can you try it out?
> So doesn't look good. Then I do this same command "collie cluster
> info" on the other zone 2 node and it does the exact same thing
> "segmentation fault". Then I go to the zone 1 nodes and run it and
> they segfault as well. So now the cluster is completely down. I have
> tried to find the node with the highest epoch and start it back up
> first but no matter what I do I can't get the cluster up, always
> segfaults.
From your log messages, it looks like node174 stores a higher epoch.
I think if you run a sheep daemon on node174 first, Sheepdog would
work again.
Thanks,
Kazutaka
More information about the sheepdog
mailing list