[Sheepdog] Sheepdog+iscsi high availability

Tue Apr 17 13:50:19 CEST 2012

On 04/16/2012 07:29 PM, joby xavier wrote:

> when i shutdown my netwoking on "node a" or completely shutdown, ucarp
> switches its Virtual IP to "node b". so the communication of iscsi
> should done through "node b" , both nodes have same iqn.
> 
> Following are logs
> 
> *node a*
> 
> 
> Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.91:7000
> <http://192.168.1.91:7000>: Network is unreachable
> Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.222:7000
> <http://192.168.1.222:7000>: Network is unreachable
> Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.117:7000
> <http://192.168.1.117:7000>: Network is unreachable
> Apr 16 16:50:42 check_majority(709) the majority of nodes are not alive
> Apr 16 16:50:42 __sd_leave(736) perhaps a network partition has occurred?
> Apr 16 16:50:42 log_sigexit(361) sheep pid 8954 exiting.
> *
> 
> node b
> 
> 
> *Apr 16 16:50:42 recover_object(1412) done:0 count:159, oid:65958b000000db
> Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:51 connect_to(227) failed to connect to 192.168.1.29:7000
> <http://192.168.1.29:7000>: Connection refused
> Apr 16 16:50:51 recover_object_from_replica(1240) failed to connect to
> 192.168.1.29:7000 <http://192.168.1.29:7000>
> Apr 16 16:50:51 do_recover_object(1363) can not recover oid 65958b000000db
> Apr 16 16:50:52 recover_object(1412) done:1 count:159, oid:65958b00000143
> Apr 16 16:50:52 connect_to(227) failed to connect to 192.168.1.29:7000
> <http://192.168.1.29:7000>: Connection refused
> Apr 16 16:50:52 recover_object_from_replica(1240) failed to connect to
> 192.168.1.29:7000 <http://192.168.1.29:7000>
> Apr 16 16:50:52 do_recover_object(1363) can not recover oid 65958b00000143
> Apr 16 16:50:52 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:54 recover_object(1412) done:2 count:159, oid:65958b000000d6
> Apr 16 16:50:54 connect_to(227) failed to connect to 192.168.1.29:7000
> <http://192.168.1.29:7000>: Connection refused
> Apr 16 16:50:54 recover_object_from_replica(1240) failed to connect to
> 192.168.1.29:7000 <http://192.168.1.29:7000>
> Apr 16 16:50:54 do_recover_object(1363) can not recover oid 65958b000000d6
> Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66
> Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
> Apr 16 16:50:56 recover_object(1412) done:3 count:159, oid:65958b000000e7
> Apr 16 16:50:56 connect_to(227) failed to connect to 192.168.1.29:7000
> <http://192.168.1.29:7000>: Connection refused
> Apr 16 16:50:56 recover_object_from_replica(1240) failed to connect to
> 192.168.1.29:7000 <http://192.168.1.29:7000>
> Apr 16 16:50:56 do_recover_object(1363) can not recover oid 65958b000000e7
> Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2

>From this log we see that node B is recovering itself and meantime it
gets a lot of READ requests. Most of this read requests failed, so
returned EIO to upper layer, which broke upper layer.

Data including 'copies' are recovered during the recovery stage to
ensure data reliability. But this would cause trouble for in-fly IO
(block requests or event EIO on some targeted data).

Would you please tell me which commit of source code do you use?

To mitigate the problem, I'd suggest you to use 'object cache' for VM
because current object cache layer is somewhat more robust for serving
requests during recovery stage.

If you really need strong consistency (donot use object cache), I think
some code of sheep need fixing to mitigate the problem (need retry other
copies instead of return EIO immediately), unfortunately couldn't get
rid of the problem entirely.

Thanks,
Yuan