At Wed, 29 May 2013 20:38:39 +0800, Kai Zhang wrote: > > Is there a way that sheep can rejoin cluster other than panic? > Because currently sheep panic will cause a qemu restart which should be avoided in production environment. Although ZooKeeper easily causes a timeout, the problem is not specific to the zookeeper driver. I think this should be fixed in sheep/group.c. The naive approach I'm trying to implement is: 1. add a function like 'sd_timeout_handler' to group.c, which will be called when a timeout is detected in the cluster driver. 2. become a gateway node after sd_timeout_handler() is called. 3. try to rejoin several times, and exit the program if sheep cannot join Sheepdog again. Thanks, Kazutaka |