[Stgt-devel] iSER multiple readers

Pete Wyckoff pw
Fri Feb 8 20:11:33 CET 2008

robin.humble+stgt at anu.edu.au wrote on Thu, 07 Feb 2008 21:07 -0500:
> I think I'm seeing iSER read corruption problems.
> a) in stock centos5.1 when not using kernel I get read
>    corruption with just a single reader
> b) every kernel/OS/ofed combination that I've tried when there are
>    multiple simultaneous readers
> the multiple reader problem happens whether the data is read from
> multiple luns or clients or... well, just multiple reads to a single
> tgtd doing iSER seems to be enough to cause it.
> I'm hoping these problems are just something that I've broken in my
> setup... previously 2.6.18-52.el5 + centos5 worked (in the single
> reader case) but I can't make it work now.
> is anyone else seeing easy read corruption?
> the easiest way I can reproduce it is:
>  initiator side - write data:
>    lmdd if=internal of=/dev/sdc opat=1 bs=1M count=1000
>  target side - check that the file is ok (it is):
>    lmdd of=internal if=/mnt/ramdisk/file ipat=1 bs=1M count=1000 mismatch=1
>  initiator side - read and check data (is sometimes ok):
>    lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=1
>  initiator side - read data with 2 processes at once (always fails):
>    lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=1 &
>    lmdd of=internal if=/dev/sdc ipat=1 bs=1M count=1000 mismatch=1 &
> I'm using the kernel git tree of stgt.
> I don't see any problems when using TCP IPoIB.

I've tried this and a few variations but can't find any problems.
That's unfortunate.  To debug it, perhaps you can investigate the
mismatched data that comes back and see if you can discern a
pattern.  Like if it is always at 4k boundaries, or always at 512k
boundaries, that could help us to narrow it down.

You had another corruption issue a long time ago that I thought was
related to the response message getting in front of the RDMA.  But
IB guys insist that this is not possible.  I had a patch that I very
much did not like that delayed the final response message until the
target saw the local completions for its RDMAs.  This never went in.
It is dated 16 oct 2007.  In case your notes or mail archives lead
you to believe this current read corruption is similar.

		-- Pete

More information about the stgt mailing list