Hi, yesterday I had a strange behavior of my sheepdog cluster. number of copies is set to two # collie node list M Id Host:Port V-Nodes Zone - 0 10.0.1.61:7000 0 1023475722 - 1 10.0.1.61:7001 64 1023475722 - 2 10.0.1.62:7000 0 1040252938 - 3 10.0.1.62:7001 64 1040252938 - 4 10.0.1.62:7002 64 1040252938 - 5 10.0.1.62:7003 64 1040252938 - 6 10.0.1.63:7000 0 1057030154 - 7 10.0.1.63:7001 64 1057030154 - 8 10.0.1.63:7002 64 1057030154 - 9 10.0.1.63:7003 64 1057030154 I had to shutdown 10.0.1.62, the other two servers start recovering immediately. While the sheep on 10.0.1.61 was still recovering, the failed node came back and the sheeps are started too. At this moment, the hole cluster semms to hang, collie node info returns only a few lines and the virtual machines cant access the images. Two hours later, the recovery finished, collie commands reacts normal and I could start the virtual machines but discovered some strange behavior inside... The logfile of the gateway sheep on 10.0.1.63 gives me a lot of the following errors... [..] Jul 20 15:10:25 [gateway 0] forward_write_obj_req(188) fail 2 Jul 20 15:10:26 [gateway 2] forward_write_obj_req(188) fail 2 Jul 20 15:10:26 [gateway 3] forward_write_obj_req(188) fail 2 Jul 20 15:10:26 [gateway 1] forward_write_obj_req(188) fail 2 [..] When I start a collie vdi check the most vdi gives an error message... [...] fix c956c0000022f success fix c956c00000230 success fix c956c00000231 success Failed to read, No object found Would anyone know if this is maybe an fixed bug in my older version (0.3.0_431_g2361852) or could explain, what happens in this situation? Cheers Bastian |