[sheepdog] zookeeper quitting unexpectedly causes recovering from journal file failed

Liu Yuan namei.unix at gmail.com
Thu May 23 07:37:06 CEST 2013


On 05/23/2013 01:29 PM, Hongyi Wang wrote:
> Hi, 
> 
> This is followed by our last test. One sheep node in our cluster
> connected zookeeper timeout so we tried to restart sheep on the node.
> However, the sheep cannot be started successfully, 
> I am not sure if zk connection timeout could somehow causes recovering
> from journal file failed? Is this a bug of journal replay?

I guess it is a bug of journal replay for some corner cases. Pass 'skip'
for -j or simply remove files in journal dir will start the sheep again.

Thanks,
Yuan



More information about the sheepdog mailing list