[Stgt-devel] iSER

Pete Wyckoff pw
Thu Oct 4 19:20:35 CEST 2007


pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400:
> robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
> > Summary:
> >  - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
> >    patched kernels all seem to have iSER bugs that make them unusable.
> >  - as everything works in 2.6.21 presumably this means there's nothing
> >    wrong with the iSER implementation in tgtd. well done! :)
> 
> Well, that's good and bad news.  Nice to know that things do work at times,
> but we have to figure out what happened in the initiator now.  Or maybe tgt
> is making some bad assumptions.

This all turned out to be a known bug in the mthca IB driver in
kernels older than 2.6.21.  Including the rhel5 kernel.  The
initiator uses FMR for memory registrations, and a certain popular
chipset was prone to random scribbling on old registrations,
yielding wrong data in the application or unexplainable kernel
crashes.  Nothing wrong in the target.

> > with the 2.6.22.6 kernel and iSER I couldn't find any corruption
> > issues using dd to /dev/sdc. however (as reported previously) if I put
> > an ext3 filesystem on the iSER device and then dd to a file in the ext3
> > filsystem then pretty much immediately I get:
> >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
> >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
> >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
> >   ...
> > 
> > I get the same type of errors with 2.6.23-rc5 too.
> 
> I'm still not been able to reproduce this, at least on my
> 2.6.22-rc5.  One of these days we'll move to some newer kernels
> here, but have been sort of waiting for the bidi approaches to
> stabilize somewhat.

Maybe this is fixed.  I did find one possible case where the Send
result may have gone out before the final RDMA write, in the case
when the target is starved for RDMA slots.  But I never saw the
problem myself, so can't say for sure.

In fact, I hacked up the bs-sync code to calculate the result
expected by the test application lmdd, rather than read it off disk,
and could achieve your high throughputs but never any corruptions.
It ran all night last night.

Anyway, there's a new git out there with this one new patch and some
kernel initiator warnings in the README.iser doc.

		-- Pete



More information about the stgt mailing list