[stgt] stgtd 0.9.3 : Read-Errors using iser transport

Mon Feb 16 06:06:36 CET 2009

On Thu, 12 Feb 2009 11:29:45 +0100
"Dr. Volker Jaenisch" <volker.jaenisch at inqbus.de> wrote:

> Dear Mr. Tomonori!
> 
> We got read errors usinfg iser (over infiniband) transport with stgtd (0.9.3).
> I discussed this on the open-iscsi mailing list firstly. 
> 
> After review of our tests I found that restarting stgt 
> cures the read-errors for the next access to the target.
> 
> Here is what we have done:
> 
> On Initiator writing:
> ares:~# lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000 mismatch=1
> 1000.0000 MB in 6.3606 secs, 157.2190 MB/sec
> 
> Check on Target is fine:
> athene:~# lmdd of=internal if=/dev/vg0/test ipat=1 bs=1M count=1000 
> mismatch=1
> 1000.0000 MB in 0.8849 secs, 1130.0176 MB/sec
> 
> On initiator reading:
> ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
> off=1000000 want=1a0000 got=1b3000
> off=1000000 want=1a0004 got=1b3004
> off=1000000 want=1a0008 got=1b3008
> off=1000000 want=1a000c got=1b300c
> off=1000000 want=1a0010 got=1b3010
> off=1000000 want=1a0014 got=1b3014
> off=1000000 want=1a0018 got=1b3018
> off=1000000 want=1a001c got=1b301c
> off=1000000 want=1a0020 got=1b3020
> off=1000000 want=1a0024 got=1b3024
> 1.0000 MB in 0.0064 secs, 157.2822 MB/sec
> 
> But if I restart the TGT-Daemon on the target side: Every thing is ok.
> ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
> 1000.0000 MB in 22.2695 secs, 44.9045 MB/sec
> But only for the first run of lmdd! Then the error strikes reproducable 
> every time.
> 
> ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
> off=0 want=8ae00 got=a9e00
> off=0 want=8ae04 got=a9e04
> off=0 want=8ae08 got=a9e08
> off=0 want=8ae0c got=a9e0c
> off=0 want=8ae10 got=a9e10
> off=0 want=8ae14 got=a9e14
> off=0 want=8ae18 got=a9e18
> off=0 want=8ae1c got=a9e1c
> off=0 want=8ae20 got=a9e20
> off=0 want=8ae24 got=a9e24
> 0.0000 MB in 0.0029 secs, 0.0000 MB/sec
> ares:~# lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=10
> off=51000000 want=3129e00 got=3147e00
> off=51000000 want=3129e04 got=3147e04
> off=51000000 want=3129e08 got=3147e08
> off=51000000 want=3129e0c got=3147e0c
> off=51000000 want=3129e10 got=3147e10
> off=51000000 want=3129e14 got=3147e14
> off=51000000 want=3129e18 got=3147e18
> off=51000000 want=3129e1c got=3147e1c
> off=51000000 want=3129e20 got=3147e20
> off=51000000 want=3129e24 got=3147e24
> 51.0000 MB in 0.1463 secs, 348.5702 MB/sec
> 
> How to debug further?

In short,

- the target box got the data from the initiator box and wrote it to
disk properly.

- the target box reads the data and and sends it to the initiator
properly on the first run after restarting tgtd.

- then the target box sends the wrong data after that.

Right?

How about writing twice? The target can still store the data (which
written on the second run) on disk?
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html