[Stgt-devel] iSER

Fri Oct 5 01:06:11 CEST 2007

On Thu, 4 Oct 2007 13:20:35 -0400
Pete Wyckoff <pw at osc.edu> wrote:

> pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400:
> > robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
> > > Summary:
> > >  - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
> > >    patched kernels all seem to have iSER bugs that make them unusable.
> > >  - as everything works in 2.6.21 presumably this means there's nothing
> > >    wrong with the iSER implementation in tgtd. well done! :)
> > 
> > Well, that's good and bad news.  Nice to know that things do work at times,
> > but we have to figure out what happened in the initiator now.  Or maybe tgt
> > is making some bad assumptions.
> 
> This all turned out to be a known bug in the mthca IB driver in
> kernels older than 2.6.21.  Including the rhel5 kernel.  The
> initiator uses FMR for memory registrations, and a certain popular
> chipset was prone to random scribbling on old registrations,
> yielding wrong data in the application or unexplainable kernel
> crashes.  Nothing wrong in the target.
> 
> > > with the 2.6.22.6 kernel and iSER I couldn't find any corruption
> > > issues using dd to /dev/sdc. however (as reported previously) if I put
> > > an ext3 filesystem on the iSER device and then dd to a file in the ext3
> > > filsystem then pretty much immediately I get:
> > >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
> > >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
> > >   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
> > >   ...
> > > 
> > > I get the same type of errors with 2.6.23-rc5 too.
> > 
> > I'm still not been able to reproduce this, at least on my
> > 2.6.22-rc5.  One of these days we'll move to some newer kernels
> > here, but have been sort of waiting for the bidi approaches to
> > stabilize somewhat.
> 
> Maybe this is fixed.  I did find one possible case where the Send
> result may have gone out before the final RDMA write, in the case
> when the target is starved for RDMA slots.  But I never saw the
> problem myself, so can't say for sure.
> 
> In fact, I hacked up the bs-sync code to calculate the result
> expected by the test application lmdd, rather than read it off disk,
> and could achieve your high throughputs but never any corruptions.
> It ran all night last night.
> 
> Anyway, there's a new git out there with this one new patch and some
> kernel initiator warnings in the README.iser doc.

Sounds promising. voltaire guys, any chance to try Pete's latest tree?