[sheepdog] [PATCH 0/3] zookeeper: fix error handling
MORITA Kazutaka
morita.kazutaka at gmail.com
Thu May 30 15:16:48 CEST 2013
At Wed, 29 May 2013 20:38:39 +0800,
Kai Zhang wrote:
>
> Is there a way that sheep can rejoin cluster other than panic?
> Because currently sheep panic will cause a qemu restart which should be avoided in production environment.
Although ZooKeeper easily causes a timeout, the problem is not
specific to the zookeeper driver. I think this should be fixed in
sheep/group.c.
The naive approach I'm trying to implement is:
1. add a function like 'sd_timeout_handler' to group.c, which will be
called when a timeout is detected in the cluster driver.
2. become a gateway node after sd_timeout_handler() is called.
3. try to rejoin several times, and exit the program if sheep cannot
join Sheepdog again.
Thanks,
Kazutaka
More information about the sheepdog
mailing list