[Sheepdog] Sheepdog+iscsi high availability
Huxinwei
huxinwei at huawei.com
Tue Apr 17 03:10:33 CEST 2012
How many replications you have in the cluster ?
The log indicates that recovery failed due to “ No object found”
From: joby xavier [mailto:jobycxa at gmail.com]
Sent: Monday, April 16, 2012 7:30 PM
To: Huxinwei
Cc: sheepdog at lists.wpkg.org
Subject: Re: [Sheepdog] Sheepdog+iscsi high availability
when i shutdown my netwoking on "node a" or completely shutdown, ucarp switches its Virtual IP to "node b". so the communication of iscsi should done through "node b" , both nodes have same iqn.
Following are logs
node a
Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.91:7000<http://192.168.1.91:7000>: Network is unreachable
Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.222:7000<http://192.168.1.222:7000>: Network is unreachable
Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.117:7000<http://192.168.1.117:7000>: Network is unreachable
Apr 16 16:50:42 check_majority(709) the majority of nodes are not alive
Apr 16 16:50:42 __sd_leave(736) perhaps a network partition has occurred?
Apr 16 16:50:42 log_sigexit(361) sheep pid 8954 exiting.
node b
Apr 16 16:50:42 recover_object(1412) done:0 count:159, oid:65958b000000db
Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:51 connect_to(227) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>: Connection refused
Apr 16 16:50:51 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>
Apr 16 16:50:51 do_recover_object(1363) can not recover oid 65958b000000db
Apr 16 16:50:52 recover_object(1412) done:1 count:159, oid:65958b00000143
Apr 16 16:50:52 connect_to(227) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>: Connection refused
Apr 16 16:50:52 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>
Apr 16 16:50:52 do_recover_object(1363) can not recover oid 65958b00000143
Apr 16 16:50:52 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:54 recover_object(1412) done:2 count:159, oid:65958b000000d6
Apr 16 16:50:54 connect_to(227) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>: Connection refused
Apr 16 16:50:54 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>
Apr 16 16:50:54 do_recover_object(1363) can not recover oid 65958b000000d6
Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 recover_object(1412) done:3 count:159, oid:65958b000000e7
Apr 16 16:50:56 connect_to(227) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>: Connection refused
Apr 16 16:50:56 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>
Apr 16 16:50:56 do_recover_object(1363) can not recover oid 65958b000000e7
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:57 fix_object_consistency(738) failed to read object 66
Apr 16 16:50:58 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 recover_object(1412) done:4 count:159, oid:65958b00000117
Apr 16 16:50:59 connect_to(227) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>: Connection refused
Apr 16 16:50:59 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000<http://192.168.1.29:7000>
Apr 16 16:50:59 do_recover_object(1363) can not recover oid 65958b00000117
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2
Apr 16 16:51:00 recover_object(1412) done:5 count:159, oid:65958b000000dc
Apr 16 16:51:00 do_recover_object(1363) can not recover oid 65958b000000dc
Apr 16 16:51:00 recover_object(1412) done:6 count:159, oid:65958b000000cc
Apr 16 16:51:00 do_recover_object(1363) can not recover oid 65958b000000cc
Apr 16 16:51:01 recover_object(1412) done:7 count:159, oid:65958b00000145
Apr 16 16:51:01 recover_object(1412) done:8 count:159, oid:65958b0000017b
Apr 16 16:51:01 recover_object(1412) done:9 count:159, oid:65958b0000000b
Apr 16 16:51:01 recover_object(1412) done:10 count:159, oid:65958b000000d5
Apr 16 16:51:01 recover_object(1412) done:11 count:159, oid:65958b00000022
Apr 16 16:51:01 do_recover_object(1363) can not recover oid 65958b00000022
Apr 16 16:51:02 recover_object(1412) done:12 count:159, oid:65958b00000131
Apr 16 16:51:02 do_recover_object(1363) can not recover oid 65958b00000131
Apr 16 16:51:02 fix_object_consistency(738) failed to read object 2
Apr 16 16:51:03 recover_object(1412) done:13 count:159, oid:65958b00000101
Apr 16 16:51:03 do_recover_object(1363) can not recover oid 65958b00000101
Apr 16 16:51:04 recover_object(1412) done:14 count:159, oid:65958b00000159
Apr 16 16:51:04 do_recover_object(1363) can not recover oid 65958b00000159
Apr 16 16:51:05 recover_object(1412) done:15 count:159, oid:65958b00000115
Apr 16 16:51:05 recover_object(1412) done:16 count:159, oid:65958b000000f7
Apr 16 16:51:05 do_recover_object(1363) can not recover oid 65958b000000f7
Apr 16 16:51:06 recover_object(1412) done:17 count:159, oid:65958b000000c7
Apr 16 16:51:06 do_recover_object(1363) can not recover oid 65958b000000c7
Apr 16 16:51:06 fix_object_consistency(738) failed to read object 2
Apr 16 16:51:07 recover_object(1412) done:18 count:159, oid:65958b00000182
Apr 16 16:51:07 do_recover_object(1363) can not recover oid 65958b00000182
Apr 16 16:51:08 recover_object(1412) done:19 count:159, oid:65958b00000129
Apr 16 16:51:08 do_recover_object(1363) can not recover oid 65958b00000129
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:44 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:46 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2
Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2
Apr 16 16:59:39 fix_object_consistency(738) failed to read object 2
Apr 16 16:59:39 fix_object_consistency(738) failed to read object 2
Thanks,
Joby Xavier
On Mon, Apr 16, 2012 at 3:07 PM, Huxinwei <huxinwei at huawei.com<mailto:huxinwei at huawei.com>> wrote:
When the fail-over failed, have you used the hook for ucarp to restart the scsi target on ‘node b’?
Also, do you have logs from both target nodes. It’ll be very helpful.
Thanks.
From: sheepdog-bounces at lists.wpkg.org<mailto:sheepdog-bounces at lists.wpkg.org> [mailto:sheepdog-bounces at lists.wpkg.org<mailto:sheepdog-bounces at lists.wpkg.org>] On Behalf Of joby xavier
Sent: Monday, April 16, 2012 4:59 PM
To: sheepdog at lists.wpkg.org<mailto:sheepdog at lists.wpkg.org>
Subject: [Sheepdog] Sheepdog+iscsi high availability
HI,
We would like to set up a iscsi high availability with sheepdog distributed
storage .
Here is our system set up: OS - Ubuntu. Four nodes with sheepdog
distributed storage and we are sharing this storage through iscsi using
two nodes as well as using a virtual ip set up using ucarp.Two nodes are
using same iqn. And mounted the iscsi storage as lvm partition (sdc)
node a
node b
node c
node d
node x is the initiator
node a and b having common virtual ip because if 'node a' fails 'node
b' should serve as iscsi target, both have same iqn.
Problem: when a failover happens ie iscsi switching from node one to
two, the iscsi disk fails on initiator 'node x'
Here is the /var/log/messeage
Apr 16 10:57:14 prox1 kernel: scsi7 : iSCSI Initiator over TCP/IP
Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5
Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] 2252800 512<tel:2252800%20512>-byte logical blocks: (1.15 GB/1.07 GiB)
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write Protect is off
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 16 10:57:14 prox1 kernel: sdc: unknown partition table
Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Attached SCSI disk
Apr 16 10:59:47 prox1 kernel: connection2:0: detected conn error (1020)
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 08 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error [current]
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered read error
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code
root at prox1:~# pvdisplay
/dev/sdc: read failed after 0 of 4096 at 1153368064: Input/output error
/dev/sdc: read failed after 0 of 4096 at 1153425408: Input/output error
sheepdog with single node iscsi ( https://github.com/collie/sheepdog/wiki/General-protocol-support) works well
should we do anything on lvm.conf? should we use multipath-tools? is this the right procedure?
Thanks,
Joby Xavier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120417/a560f6fa/attachment-0003.html>
More information about the sheepdog
mailing list