when i shutdown my netwoking on "node a" or completely shutdown, ucarp switches its Virtual IP to "node b". so the communication of iscsi should done through "node b" , both nodes have same iqn. Following are logs *node a* Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.91:7000: Network is unreachable Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.222:7000: Network is unreachable Apr 16 16:50:42 connect_to(227) failed to connect to 192.168.1.117:7000: Network is unreachable Apr 16 16:50:42 check_majority(709) the majority of nodes are not alive Apr 16 16:50:42 __sd_leave(736) perhaps a network partition has occurred? Apr 16 16:50:42 log_sigexit(361) sheep pid 8954 exiting. * node b *Apr 16 16:50:42 recover_object(1412) done:0 count:159, oid:65958b000000db Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:48 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:49 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:50 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:51 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:51 connect_to(227) failed to connect to 192.168.1.29:7000: Connection refused Apr 16 16:50:51 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000 Apr 16 16:50:51 do_recover_object(1363) can not recover oid 65958b000000db Apr 16 16:50:52 recover_object(1412) done:1 count:159, oid:65958b00000143 Apr 16 16:50:52 connect_to(227) failed to connect to 192.168.1.29:7000: Connection refused Apr 16 16:50:52 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000 Apr 16 16:50:52 do_recover_object(1363) can not recover oid 65958b00000143 Apr 16 16:50:52 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:54 recover_object(1412) done:2 count:159, oid:65958b000000d6 Apr 16 16:50:54 connect_to(227) failed to connect to 192.168.1.29:7000: Connection refused Apr 16 16:50:54 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000 Apr 16 16:50:54 do_recover_object(1363) can not recover oid 65958b000000d6 Apr 16 16:50:54 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 recover_object(1412) done:3 count:159, oid:65958b000000e7 Apr 16 16:50:56 connect_to(227) failed to connect to 192.168.1.29:7000: Connection refused Apr 16 16:50:56 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000 Apr 16 16:50:56 do_recover_object(1363) can not recover oid 65958b000000e7 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:56 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:57 fix_object_consistency(738) failed to read object 66 Apr 16 16:50:58 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 recover_object(1412) done:4 count:159, oid:65958b00000117 Apr 16 16:50:59 connect_to(227) failed to connect to 192.168.1.29:7000: Connection refused Apr 16 16:50:59 recover_object_from_replica(1240) failed to connect to 192.168.1.29:7000 Apr 16 16:50:59 do_recover_object(1363) can not recover oid 65958b00000117 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:50:59 fix_object_consistency(738) failed to read object 2 Apr 16 16:51:00 recover_object(1412) done:5 count:159, oid:65958b000000dc Apr 16 16:51:00 do_recover_object(1363) can not recover oid 65958b000000dc Apr 16 16:51:00 recover_object(1412) done:6 count:159, oid:65958b000000cc Apr 16 16:51:00 do_recover_object(1363) can not recover oid 65958b000000cc Apr 16 16:51:01 recover_object(1412) done:7 count:159, oid:65958b00000145 Apr 16 16:51:01 recover_object(1412) done:8 count:159, oid:65958b0000017b Apr 16 16:51:01 recover_object(1412) done:9 count:159, oid:65958b0000000b Apr 16 16:51:01 recover_object(1412) done:10 count:159, oid:65958b000000d5 Apr 16 16:51:01 recover_object(1412) done:11 count:159, oid:65958b00000022 Apr 16 16:51:01 do_recover_object(1363) can not recover oid 65958b00000022 Apr 16 16:51:02 recover_object(1412) done:12 count:159, oid:65958b00000131 Apr 16 16:51:02 do_recover_object(1363) can not recover oid 65958b00000131 Apr 16 16:51:02 fix_object_consistency(738) failed to read object 2 Apr 16 16:51:03 recover_object(1412) done:13 count:159, oid:65958b00000101 Apr 16 16:51:03 do_recover_object(1363) can not recover oid 65958b00000101 Apr 16 16:51:04 recover_object(1412) done:14 count:159, oid:65958b00000159 Apr 16 16:51:04 do_recover_object(1363) can not recover oid 65958b00000159 Apr 16 16:51:05 recover_object(1412) done:15 count:159, oid:65958b00000115 Apr 16 16:51:05 recover_object(1412) done:16 count:159, oid:65958b000000f7 Apr 16 16:51:05 do_recover_object(1363) can not recover oid 65958b000000f7 Apr 16 16:51:06 recover_object(1412) done:17 count:159, oid:65958b000000c7 Apr 16 16:51:06 do_recover_object(1363) can not recover oid 65958b000000c7 Apr 16 16:51:06 fix_object_consistency(738) failed to read object 2 Apr 16 16:51:07 recover_object(1412) done:18 count:159, oid:65958b00000182 Apr 16 16:51:07 do_recover_object(1363) can not recover oid 65958b00000182 Apr 16 16:51:08 recover_object(1412) done:19 count:159, oid:65958b00000129 Apr 16 16:51:08 do_recover_object(1363) can not recover oid 65958b00000129 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:44 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:46 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2 Apr 16 16:52:49 fix_object_consistency(738) failed to read object 2 Apr 16 16:59:39 fix_object_consistency(738) failed to read object 2 Apr 16 16:59:39 fix_object_consistency(738) failed to read object 2 Thanks, Joby Xavier On Mon, Apr 16, 2012 at 3:07 PM, Huxinwei <huxinwei at huawei.com> wrote: > When the fail-over failed, have you used the hook for ucarp to restart > the scsi target on ‘node b’?**** > > Also, do you have logs from both target nodes. It’ll be very helpful.**** > > ** ** > > Thanks.**** > > ** ** > > *From:* sheepdog-bounces at lists.wpkg.org [mailto: > sheepdog-bounces at lists.wpkg.org] *On Behalf Of *joby xavier > *Sent:* Monday, April 16, 2012 4:59 PM > *To:* sheepdog at lists.wpkg.org > *Subject:* [Sheepdog] Sheepdog+iscsi high availability**** > > ** ** > > HI,**** > > We would like to set up a iscsi high availability with sheepdog > distributed > storage . **** > > Here is our system set up: OS - Ubuntu. Four nodes with sheepdog > distributed storage and we are sharing this storage through iscsi using > two nodes as well as using a virtual ip set up using ucarp.Two nodes are > using same iqn. And mounted the iscsi storage as lvm partition (sdc) **** > > node a > node b > node c > node d > node x is the initiator > node a and b having common virtual ip because if 'node a' fails 'node > b' should serve as iscsi target, both have same iqn. **** > > Problem: when a failover happens ie iscsi switching from node one to > two, the iscsi disk fails on initiator 'node x' **** > > ** ** > > Here is the /var/log/messeage **** > > Apr 16 10:57:14 prox1 kernel: scsi7 : iSCSI Initiator over TCP/IP > Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:0: RAID IET > Controller 0001 PQ: 0 ANSI: 5 > Apr 16 10:57:14 prox1 kernel: scsi 7:0:0:1: Direct-Access IET > VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 > Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] 2252800 512-byte logical > blocks: (1.15 GB/1.07 GiB) > Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write Protect is off > Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Write cache: enabled, read > cache: enabled, doesn't support DPO or FUA > Apr 16 10:57:14 prox1 kernel: sdc: unknown partition table > Apr 16 10:57:14 prox1 kernel: sd 7:0:0:1: [sdc] Attached SCSI disk > > Apr 16 10:59:47 prox1 kernel: connection2:0: detected conn error (1020) > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid > driverbyte=DRIVER_SENSE > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error > [current] > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered > read error > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 > 00 00 00 00 08 00 > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid > driverbyte=DRIVER_SENSE > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error > [current] > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered > read error > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 > 00 00 00 00 08 00 > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid > driverbyte=DRIVER_SENSE > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error > [current] > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered > read error > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 > 00 08 00 00 08 00 > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Result: hostbyte=invalid > driverbyte=DRIVER_SENSE > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Sense Key : Medium Error > [current] > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Add. Sense: Unrecovered > read error > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] CDB: Read(10): 28 00 00 00 > 00 00 00 00 08 00 > Apr 16 10:59:51 prox1 kernel: sd 7:0:0:1: [sdc] Unhandled sense code**** > > root at prox1:~# pvdisplay > /dev/sdc: read failed after 0 of 4096 at 1153368064: Input/output error > /dev/sdc: read failed after 0 of 4096 at 1153425408: Input/output error* > *** > > sheepdog with single node iscsi ( > https://github.com/collie/sheepdog/wiki/General-protocol-support) works > well**** > > should we do anything on lvm.conf? should we use multipath-tools? is this > the right procedure?**** > > > Thanks, > > > Joby Xavier**** > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.wpkg.org/pipermail/sheepdog/attachments/20120416/ae7e4dd9/attachment.html> |