[stgt] stgtd 0.9.3 : Read-Errors using iser transport

Fri Feb 20 12:03:02 CET 2009

Hi Or!

I tracked it further down. The cofactor seems to be SMP.
When booting the server (8 cores) with maxcpus=1 the read error vanishes 
completely

This is consistent with the findings of others

http://lists.berlios.de/pipermail/stgt-devel/2008-February/001367.html

in this case multiple simultaneous reads are the problem. Please note 
also that in February 2008
the git commit "iscsi: improve iser scheduling" from Sept. 2008 has not 
exsisted.
So maybe the iser improvement has make things worse but the original 
error is older.

Or Gerlitz schrieb:
>> I found that subsequent reads on a small timescale often succeed. Also after
>> a pause of some couple of seconds reads probably succeed. THe timescale for
>> faillure lies inbetween. Please try a time distribution more randomly.
>> How big is your sample I used 1GB. When testing by hand 1 out of 6 reads fail.
>> Same behavior on stgt 0.8 and 0.9.0 .
>>     
>
> I use 1GB as well. It would be helpful if you provide me with a script
> that does these random timings between reads. Also, I noted that after
> one read, no I/O is going any more on the target side, as this 1GB
> probably gets cached. My backing store is an sdb block device and I
> wasn't sure what's your and if you have caching at all, maybe this
> influences something.
>   
As test target I use a LVM2 partition of 100GB size.
athene:~/tgt-0.9.0/usr# lvdisplay
  --- Logical volume ---
  LV Name                /dev/vg0/test
  VG Name                vg0
  LV UUID                85CXvj-YLoE-DrRt-mAVu-Ujaz-cLW4-RwQveP
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100,00 GB
  Current LE             25600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

This sequence is used to start the target

athene:~/tgt-0.9.0/usr# ./tgtd
athene:~/tgt-0.9.0/usr# tgtadm --lld iscsi --op new --mode target --tid 
1 -T de.inqbus.athene:test
athene:~/tgt-0.9.0/usr# tgtadm --lld iscsi --op new --mode logicalunit 
--tid 1 --lun 1 -b /dev/vg0/test
athene:~/tgt-0.9.0/usr# tgtadm --lld iscsi --op bind --mode target --tid 
1 -I 10.1.3.0/24
athene:~/tgt-0.9.0/usr# ./tgtadm --m target --op show
Target 1: de.inqbus.athene:test
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
        I_T nexus: 4
            Initiator: iqn.1993-08.org.debian:01:cb2c5d33d1f8
            Connection: 0
                RDMA IP Address: 10.1.3.33
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: deadbeaf1:0
            SCSI SN: beaf10
            Size: 0 MB
            Online: Yes
            Removable media: No
            Backing store: No backing store
        LUN: 1
            Type: disk
            SCSI ID: deadbeaf1:1
            SCSI SN: beaf11
            Size: 107374 MB
            Online: Yes
            Removable media: No
            Backing store: /dev/vg0/test
    Account information:
    ACL information:
        10.1.3.0/24

On initiator I do

ares:~# iscsiadm -m session
iscsiadm: No active sessions.
ares:~# iscsi_discovery 10.1.3.32 -tiser -l
iscsiadm: No active sessions.
Set target de.inqbus.athene:test to automatic login over iser to portal 
10.1.3.32:3260
discovered 1 targets at 10.1.3.32

The testscript:

import random
import os
import time

#writing
writeCmd = 'lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000'
readCmd = 'lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10'

def tee( cmd ):
    print cmd
    os.system( cmd )

tee(  writeCmd )

while True:
#    sleepTime = random.randrange(0,10)
    sleepTime = 1
    print 'sleeping %s seconds ..' % sleepTime
    time.sleep( sleepTime )

    tee(  readCmd )

Output with maxcpus=1 mem=256M (to prevent caching):

ares:~# python rndTest.py
lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000
1000.0000 MB in 2.1589 secs, 463.2016 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7827 secs, 359.3686 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8148 secs, 355.2689 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7981 secs, 357.3900 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7960 secs, 357.6489 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7885 secs, 358.6165 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8016 secs, 356.9349 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8066 secs, 356.3076 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8142 secs, 355.3358 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8174 secs, 354.9415 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8349 secs, 352.7521 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8078 secs, 356.1559 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8028 secs, 356.7876 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7893 secs, 358.5088 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7898 secs, 358.4447 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7887 secs, 358.5943 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7846 secs, 359.1125 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7951 secs, 357.7686 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7817 secs, 359.4879 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.8016 secs, 356.9339 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7750 secs, 360.3620 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.7953 secs, 357.7491 MB/sec
sleeping 1 seconds ..

This seems to be ok.

Now with 2 cores maxcpus=2 mem=256M

ares:~# python rndTest.py
lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000
1000.0000 MB in 2.3562 secs, 424.4106 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
1000.0000 MB in 2.2888 secs, 436.9008 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
off=119000000 want=71ff000 got=721f000
off=119000000 want=71ff004 got=721f004
off=119000000 want=71ff008 got=721f008
off=119000000 want=71ff00c got=721f00c
off=119000000 want=71ff010 got=721f010
off=119000000 want=71ff014 got=721f014
off=119000000 want=71ff018 got=721f018
off=119000000 want=71ff01c got=721f01c
off=119000000 want=71ff020 got=721f020
off=119000000 want=71ff024 got=721f024
119.0000 MB in 0.2676 secs, 444.6105 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
off=309000000 want=1273d600 got=1275d600
off=309000000 want=1273d604 got=1275d604
off=309000000 want=1273d608 got=1275d608
off=309000000 want=1273d60c got=1275d60c
off=309000000 want=1273d610 got=1275d610
off=309000000 want=1273d614 got=1275d614
off=309000000 want=1273d618 got=1275d618
off=309000000 want=1273d61c got=1275d61c
off=309000000 want=1273d620 got=1275d620
off=309000000 want=1273d624 got=1275d624
309.0000 MB in 0.7249 secs, 426.2704 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
off=126000000 want=7900000 got=7901000
off=126000000 want=7900004 got=7901004
off=126000000 want=7900008 got=7901008
off=126000000 want=790000c got=790100c
off=126000000 want=7900010 got=7901010
off=126000000 want=7900014 got=7901014
off=126000000 want=7900018 got=7901018
off=126000000 want=790001c got=790101c
off=126000000 want=7900020 got=7901020
off=126000000 want=7900024 got=7901024
126.0000 MB in 0.2968 secs, 424.4868 MB/sec
sleeping 1 seconds ..
lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
off=80000000 want=4cdec00 got=4cfec00
off=80000000 want=4cdec04 got=4cfec04
off=80000000 want=4cdec08 got=4cfec08
off=80000000 want=4cdec0c got=4cfec0c
off=80000000 want=4cdec10 got=4cfec10
off=80000000 want=4cdec14 got=4cfec14
off=80000000 want=4cdec18 got=4cfec18
off=80000000 want=4cdec1c got=4cfec1c
off=80000000 want=4cdec20 got=4cfec20
off=80000000 want=4cdec24 got=4cfec24
80.0000 MB in 0.1790 secs, 447.0273 MB/sec

This behavior is completely reproducable. My guess is that the AMD 
hyper-transport
may interfere with the fmr. But I am no linux memory management 
specialist .. so please correct me if I am wrong.

Maybe the following happens:
Bootet with one CPU all FMR request goes to the 16GB RAM this single CPU 
directly addresses via its memory controller.

In case of more than one active CPU the memory is fetched from both CPUs 
memory controllers
with preference to local memory. In seldom cases the memory manager 
fetchs memory for the FMR process running on CPU0 from the CPU1 via the 
hyper-transport channel and something weird happens.

This is shear gussing around I have no hard facts for this.

Cheers

Volker

-- 
====================================================
   inqbus it-consulting      +49 ( 341 )  5643800
   Dr.  Volker Jaenisch      http://www.inqbus.de
   Herloßsohnstr.    12      0 4 1 5 5    Leipzig
   N  O  T -  F Ä L L E      +49 ( 170 )  3113748
====================================================

--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html