Fri Oct 5 01:06:11 CEST 2007
On Thu, 4 Oct 2007 13:20:35 -0400
Pete Wyckoff <pw at osc.edu> wrote:
> pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400:
> > robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
> > > Summary:
> > > - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
> > > patched kernels all seem to have iSER bugs that make them unusable.
> > > - as everything works in 2.6.21 presumably this means there's nothing
> > > wrong with the iSER implementation in tgtd. well done! :)
> > Well, that's good and bad news. Nice to know that things do work at times,
> > but we have to figure out what happened in the initiator now. Or maybe tgt
> > is making some bad assumptions.
> This all turned out to be a known bug in the mthca IB driver in
> kernels older than 2.6.21. Including the rhel5 kernel. The
> initiator uses FMR for memory registrations, and a certain popular
> chipset was prone to random scribbling on old registrations,
> yielding wrong data in the application or unexplainable kernel
> crashes. Nothing wrong in the target.
> > > with the 184.108.40.206 kernel and iSER I couldn't find any corruption
> > > issues using dd to /dev/sdc. however (as reported previously) if I put
> > > an ext3 filesystem on the iSER device and then dd to a file in the ext3
> > > filsystem then pretty much immediately I get:
> > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
> > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
> > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
> > > ...
> > >
> > > I get the same type of errors with 2.6.23-rc5 too.
> > I'm still not been able to reproduce this, at least on my
> > 2.6.22-rc5. One of these days we'll move to some newer kernels
> > here, but have been sort of waiting for the bidi approaches to
> > stabilize somewhat.
> Maybe this is fixed. I did find one possible case where the Send
> result may have gone out before the final RDMA write, in the case
> when the target is starved for RDMA slots. But I never saw the
> problem myself, so can't say for sure.
> In fact, I hacked up the bs-sync code to calculate the result
> expected by the test application lmdd, rather than read it off disk,
> and could achieve your high throughputs but never any corruptions.
> It ran all night last night.
> Anyway, there's a new git out there with this one new patch and some
> kernel initiator warnings in the README.iser doc.
Sounds promising. voltaire guys, any chance to try Pete's latest tree?
More information about the stgt