[sheepdog-users] Locking problems on 0.9

Micha Kersloot micha at kovoks.nl
Wed Dec 3 11:24:57 CET 2014


Hi Hitoshi,

thank you for your reply and your time, I was just getting back on this sheepdog thing. I've followed the following procedure to try to update from 0.8 to 0.9 which ended in the current state. Hope you can follow the steps and tell me why it went wrong:

-------------------------------------
Alternate zookeeper
-------------------------------------

No production, so only on 1 server.

#cp /etc/init.d/zookeeper /etc/init.d/zookeeper2
#cp -R /etc/zookeeper /etc/zookeeper2
#cp /etc/default/zookeeper /etc/default/zookeeper2
#rm /etc/zookeeper2/conf
#ln -s /etc/zookeeper2/conf_example /etc/zookeeper2/conf
#mkdir -p /var/log/zookeeper2/
#chown zookeeper:zookeeper /var/log/zookeeper2
#mkdir -p /var/lib/zookeeper2/
#chown zookeeper:zookeeper /var/lib/zookeeper2
#ln -s /etc/zookeeper2/conf/myid
#vi /etc/init.d/zookeeper2
Vervang zookeeper door zookeeper2 in de volgende regels

[ -r "/etc/zookeeper2/conf/environment" ] || exit 0
. /etc/zookeeper2/conf/environment

#vi /etc/zookeeper2/conf/environment

NAME=zookeeper2
ZOOCFGDIR=/etc/$NAME/conf
CLASSPATH="$ZOOCFGDIR:/usr/share/java/jline.jar:/usr/share/java/log4j-1.2.jar:/usr/share/java/xercesImpl.jar:/usr/share/java/xmlParserAPIs.jar:/usr/share/java/zookeeper.jar" 

ZOOCFG="$ZOOCFGDIR/zoo.cfg" 
ZOO_LOG_DIR=/var/log/$NAME
USER=zookeeper
GROUP=zookeeper
PIDDIR=/var/run/$NAME
PIDFILE=$PIDDIR/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
JAVA=/usr/bin/java
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain" 
ZOO_LOG4J_PROP="INFO,ROLLINGFILE" 
JMXLOCALONLY=false
JAVA_OPTS="" 

#vi /etc/zookeeper2/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper2
clientPort=2182
server.30=10.10.0.30:2889:3889

Zookeeper is running with alternate configuration without problems.

-------------------------------------
Alternatieve sheepdog
-------------------------------------

On every server you use for conversion (3 servers in my case)

#mkdir -p /mnt/sheep/0.9
#mkdir -p /var/lib/sheepdog/0.9
#dpkg-deb -x sheepdog_0.9.0-1_amd64.deb /root/sheep/
#/root/sheep/usr/sbin/sheep -y 10.10.0.30 -p 7001 -c zookeeper:10.10.0.30:2182 -n /var/lib/sheepdog/0.9,/mnt/sheep/0.9/

-------------------------------------
Convert image
-------------------------------------

#qemu-img convert -O sheepdog sheepdog:kovoks-debian7-compile sheepdog:localhost:7001:kovoks-debian7-compile

-------------------------------------
Testing
-------------------------------------

#/root/sheep/usr/sbin/dog vdi lock unlock kovoks-debian7-compile -p 7001
#vi /srv/kovoks-debian7-compile
change

-drive file=sheepdog:kovoks-debian7-compile,if=virtio,index=0,boot=on,cache=writeback \

with
-drive file=sheepdog:localhost:7001:kovoks-debian7-compile,if=virtio,index=0,boot=on,cache=writeback \

Everything is working

-------------------------------------
Replace running 0.8 for 0.9 as final test
-------------------------------------

Shutdown 0.8 cluster with: 
#dog cluster shutdown

Shutdown 0.9 cluster with: 
#/root/sheep/usr/sbin/dog cluster shutdown -p 7001

Shutdown alternate zookeeper
Restart zookeeper cluster on 3 nodes

Start new 0.9 cluster with:
/root/sheep/usr/sbin/sheep -y 10.10.0.30 -c zookeeper:10.10.0.21:2181,10.10.0.22:2181,10.10.0.30:2181 -n /var/lib/sheepdog/0.9,/mnt/sheep/0.9
-------------------------------------


After that the state is as described in previous posts. You said before this couldn't happen in a production cluster, but I tried to upgrade the same way I would do in production, so why couldn't it happen there?

Met vriendelijke groet,

Micha Kersloot

Blijf op de hoogte en ontvang de laatste tips over Zimbra/KovoKs Contact:
http://twitter.com/kovoks

KovoKs B.V. is ingeschreven onder KvK nummer: 11033334

----- Oorspronkelijk bericht -----
> Van: "Hitoshi Mitake" <mitake.hitoshi at lab.ntt.co.jp>
> Aan: "KovoKs" <info at kovoks.nl>
> Cc: "Hitoshi Mitake" <mitake.hitoshi at gmail.com>, "Lista sheepdog user" <sheepdog-users at lists.wpkg.org>
> Verzonden: Woensdag 3 december 2014 11:10:27
> Onderwerp: Re: [sheepdog-users] Locking problems on 0.9

> At Thu, 20 Nov 2014 16:08:29 +0100 (CET),
> Micha Kersloot wrote:
>> 
>> Hi,
>> 
>> ----- Original Message -----
>> > From: "Hitoshi Mitake" <mitake.hitoshi at gmail.com>
>> > To: "Micha Kersloot" <micha at kovoks.nl>
>> > Cc: "Lista sheepdog user" <sheepdog-users at lists.wpkg.org>
>> > Sent: Thursday, November 20, 2014 3:54:01 PM
>> > Subject: Re: [sheepdog-users] Locking problems on 0.9
>> 
>> > On Tue, Nov 11, 2014 at 6:08 PM, Micha Kersloot <micha at kovoks.nl> wrote:
>> >> Hi Hitoshi,
>> >>
>> >> thank you for your time.
>> >>
>> >>
>> >> Cluster status: Waiting for other nodes to join cluster
>> >>
>> >> Cluster created at Tue Nov  4 14:22:03 2014
>> >>
>> >> Epoch Time           Version
>> >> 2014-11-04 16:55:02      9 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> >> 2014-11-04 16:54:56      8 [10.10.0.21:7001, 10.10.0.30:7001]
>> >> 2014-11-04 16:54:33      7 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> >> 1970-01-01 01:00:00      6 []
>> > 
>> > The above 6th epoch would be the root cause of the problem. An epoch
>> > with no nodes (clearly it cannot be happened on normal situation) can
>> > wipe data under sheepdog's recovery logic.
>> > I'll prepare a patch for avoiding creation of such an epoch later.
>> > 
>> > BTW, can you see such an epoch with no nodes in other sheep daemon?
>> > 
>> 
>> That would be on 10.10.0.21:
>> 
>> Cluster status: Waiting for other nodes to join cluster
>> 
>> Cluster created at Tue Nov  4 14:22:03 2014
>> 
>> Epoch Time           Version
>> 2014-11-04 16:55:02      9 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:56      8 [10.10.0.21:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:33      7 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:18      6 [10.10.0.21:7001, 10.10.0.22:7001]
>> 2014-11-04 16:52:45      5 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 1970-01-01 01:00:00      4 []
>> 2014-11-04 16:47:43      3 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:46:43      2 [10.10.0.21:7001, 10.10.0.30:7001]
>> 1970-01-01 01:00:00      1 []
>> 
>> 
>> 
>> 
>> ---------
>> on 10.10.0.22:
>> Cluster status: Waiting for other nodes to join cluster
>> 
>> Cluster created at Tue Nov  4 14:22:03 2014
>> 
>> Epoch Time           Version
>> 2014-11-04 16:55:02      9 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 1970-01-01 01:00:00      8 []
>> 2014-11-04 16:54:33      7 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:18      6 [10.10.0.21:7001, 10.10.0.22:7001]
>> 2014-11-04 16:52:45      5 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:52:32      4 [10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:47:43      3 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 1970-01-01 01:00:00      2 []
>> 1970-01-01 01:00:00      1 []
>> 
>> 
>> ---------------
>> on 10.10.0.30:
>> Cluster status: Waiting for other nodes to join cluster
>> 
>> Cluster created at Tue Nov  4 14:22:03 2014
>> 
>> Epoch Time           Version
>> 2014-11-04 16:55:02      9 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:56      8 [10.10.0.21:7001, 10.10.0.30:7001]
>> 2014-11-04 16:54:33      7 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 1970-01-01 01:00:00      6 []
>> 2014-11-04 16:52:45      5 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:52:32      4 [10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:47:43      3 [10.10.0.21:7001, 10.10.0.22:7001, 10.10.0.30:7001]
>> 2014-11-04 16:46:43      2 [10.10.0.21:7001, 10.10.0.30:7001]
>> 2014-11-04 14:22:03      1 [10.10.0.30:7001]
>> 
> 
> Thanks for your additional information. It is helpful for me.
> 
> Thanks,
> Hitoshi



More information about the sheepdog-users mailing list