<div dir="ltr">Thanks, that's what I suspected. I'll use the shutdown command from now on. :-)<div><br></div><div>To your other question, I'm running the 0.6.0 version--I know, it's way out of date! But I am trying to get sheepdog integrated as an optional configuration for Devstack, and I don't think the Devstack team likes including alternate PPAs or building other projects from source. That's the version that Ubuntu 13.10 supports; hopefully Devstack will bump itself up to Trusty soon.</div>
<div><br></div><div>~ Scott</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Apr 17, 2014 at 9:41 PM, Hitoshi Mitake <span dir="ltr"><<a href="mailto:mitake.hitoshi@gmail.com" target="_blank">mitake.hitoshi@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Fri, Apr 18, 2014 at 10:16 AM, Scott Devoid <<a href="mailto:devoid@anl.gov">devoid@anl.gov</a>> wrote:<br>
> Thanks Hitoshi,<br>
><br>
> So I am seeing some interesting behavior when I try to shutdown and restart<br>
> my 3 node cluster:<br>
><br>
> $ for pid in `pgrep | sheep`; do kill -15 $pid; sleep 2; done<br>
> $ for i in 0 1 2; do sheep -c local -d /path/to/store/$i -z $i -p 700$i;<br>
> sleep 1; done<br>
><br>
> Shutdown works fine, but when I go to start the cluster up the first member<br>
> fails when the second joins. I think this is because the second member has a<br>
> later epoch than the first.<br>
><br>
> Here is the tail of the first member logs:<br>
> <a href="http://paste.openstack.org/show/76195/" target="_blank">http://paste.openstack.org/show/76195/</a><br>
><br>
> Let me know if I am doing things incorrectly.<br>
<br>
</div>A little bit follow up:<br>
<br>
As you say, the problem is caused by the difference of epoch numbers.<br>
<br>
The detail of the problem is like below:<br>
1. sheep A, B, and C form a cluster<br>
2. kill command kills A, and the killing is notified to B and C. So B<br>
and C update their membership (called epoch).<br>
3. the for loop kills B and C with 2 seconds interval<br>
4. restart sheeps, the second for loop restarts A<br>
5. the for loop restarts B. B's membership is newer than A. So A exits<br>
voluntary because it doesn't know the latest membership.<br>
<br>
Thanks,<br>
Hitoshi<br>
</blockquote></div><br></div>