[sheepdog-users] Single-node sheepdog for testing

Fri Apr 18 04:41:30 CEST 2014

On Fri, Apr 18, 2014 at 10:16 AM, Scott Devoid <devoid at anl.gov> wrote:
> Thanks Hitoshi,
>
> So I am seeing some interesting behavior when I try to shutdown and restart
> my 3 node cluster:
>
> $ for pid in `pgrep | sheep`; do kill -15 $pid; sleep 2; done
> $ for i in 0 1 2; do sheep -c local -d /path/to/store/$i -z $i -p 700$i;
> sleep 1; done
>
> Shutdown works fine, but when I go to start the cluster up the first member
> fails when the second joins. I think this is because the second member has a
> later epoch than the first.
>
> Here is the tail of the first member logs:
> http://paste.openstack.org/show/76195/
>
> Let me know if I am doing things incorrectly.

A little bit follow up:

As you say, the problem is caused by the difference of epoch numbers.

The detail of the problem is like below:
1. sheep A, B, and C form a cluster
2. kill command kills A, and the killing is notified to B and C. So B
and C update their membership (called epoch).
3. the for loop kills B and C with 2 seconds interval
4. restart sheeps, the second for loop restarts A
5. the for loop restarts B. B's membership is newer than A. So A exits
voluntary because it doesn't know the latest membership.

Thanks,
Hitoshi