At Fri, 16 Dec 2011 06:00:02 +0900, MORITA Kazutaka wrote: > > At Thu, 15 Dec 2011 20:42:09 +0000, > Chris Webb wrote: > > > > MORITA Kazutaka <morita.kazutaka at lab.ntt.co.jp> writes: > > > > > Probably, it is a bug of Sheepdog. Is there an easy way to reproduce > > > it with a small cluster? I'd like to try to test it, too. > > > > Hi Kazutaka. I just started a small cluster of three machines (with three > > sheep per machine on three different drives, but I'm sure it would work just > > as well with only one), did a cluster format with --copies=2, and wrote a > > vdi to the cluster so I had something to test with. > > > > I then (effectively---actually did an ip link set ethX down) unplugged the > > network to one of the machines. When I did a collie vdi list on one of the > > machines in the remaining cluster, it paused until it noticed the machine > > had gone, then continued correctly. However, the sheep daemon never seemed > > to exit on the machine that had been disconnected, and collie vdi list just > > hung forever. It seems to happen this way every time so is probably very > > easy to reproduce. > > I tried it with the master branch just now, but the sheep exited > correctly on my environment. Can you give me the sheep.log of the > disconnected sheep? I've sent some fixes related to network failure. Can you try with the devel branch again? Thanks, Kazutaka |