[Sheepdog] Segmentation faults and cluster failure

MORITA Kazutaka morita.kazutaka at lab.ntt.co.jp
Sat Sep 17 09:51:55 CEST 2011


At Fri, 16 Sep 2011 10:25:19 -0400,
Shawn Moore wrote:
> 
> I will apologize in advance for this really long post, but I like to
> provide as much data as I can in advance.

Thanks for your report.  It was really helpful. :)

I sent a patch to show a correct output of 'collie cluster info'
without segfault.  Can you try it out?

> So doesn't look good.  Then I do this same command "collie cluster
> info" on the other zone 2 node and it does the exact same thing
> "segmentation fault".  Then I go to the zone 1 nodes and run it and
> they segfault as well.  So now the cluster is completely down.  I have
> tried to find the node with the highest epoch and start it back up
> first but no matter what I do I can't get the cluster up, always
> segfaults.

From your log messages, it looks like node174 stores a higher epoch.
I think if you run a sheep daemon on node174 first, Sheepdog would
work again.


Thanks,

Kazutaka



More information about the sheepdog mailing list