[Stgt-devel] iSER

Thu Oct 11 07:57:42 CEST 2007

On Mon, 08 Oct 2007 17:36:16 +0200
Erez Zilber <erezz at Voltaire.COM> wrote:

> FUJITA Tomonori wrote:
> > On Thu, 4 Oct 2007 13:20:35 -0400
> > Pete Wyckoff <pw at osc.edu> wrote:
> >
> >   
> >> pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400:
> >>     
> >>> robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
> >>>       
> >>>> Summary:
> >>>>  - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
> >>>>    patched kernels all seem to have iSER bugs that make them unusable.
> >>>>  - as everything works in 2.6.21 presumably this means there's nothing
> >>>>    wrong with the iSER implementation in tgtd. well done! :)
> >>>>         
> >>> Well, that's good and bad news.  Nice to know that things do work at times,
> >>> but we have to figure out what happened in the initiator now.  Or maybe tgt
> >>> is making some bad assumptions.
> >>>       
> >> This all turned out to be a known bug in the mthca IB driver in
> >> kernels older than 2.6.21.  Including the rhel5 kernel.  The
> >> initiator uses FMR for memory registrations, and a certain popular
> >> chipset was prone to random scribbling on old registrations,
> >> yielding wrong data in the application or unexplainable kernel
> >> crashes.  Nothing wrong in the target.
> >>
> >>     
> >>>> with the 2.6.22.6 kernel and iSER I couldn't find any corruption
> >>>> issues using dd to /dev/sdc. however (as reported previously) if I put
> >>>> an ext3 filesystem on the iSER device and then dd to a file in the ext3
> >>>> filsystem then pretty much immediately I get:
> >>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
> >>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
> >>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
> >>>>   ...
> >>>>
> >>>> I get the same type of errors with 2.6.23-rc5 too.
> >>>>         
> >>> I'm still not been able to reproduce this, at least on my
> >>> 2.6.22-rc5.  One of these days we'll move to some newer kernels
> >>> here, but have been sort of waiting for the bidi approaches to
> >>> stabilize somewhat.
> >>>       
> >> Maybe this is fixed.  I did find one possible case where the Send
> >> result may have gone out before the final RDMA write, in the case
> >> when the target is starved for RDMA slots.  But I never saw the
> >> problem myself, so can't say for sure.
> >>
> >> In fact, I hacked up the bs-sync code to calculate the result
> >> expected by the test application lmdd, rather than read it off disk,
> >> and could achieve your high throughputs but never any corruptions.
> >> It ran all night last night.
> >>
> >> Anyway, there's a new git out there with this one new patch and some
> >> kernel initiator warnings in the README.iser doc.
> >>     
> >
> > Sounds promising. voltaire guys, any chance to try Pete's latest tree?
> 
> We ran some tests on it and it looks ok now (still trying to make it 
> crash :-) ). We will run more nasty tests soon, and if anything goes 
> wrong, we will report. We will also try to get some performance numbers 
> (BW, iops) from our storage.

Cool.

Pete, the iSER patchset is ready for re-submission?

BTW, can you elaborate on the following commit?

http://git.osc.edu/?p=tgt.git;a=commit;h=8d9eae7acd041fc10a7cfe560c1c280dcc290fa1

What type of commands hit this bug?

Thanks,