On Thu, 4 Oct 2007 13:20:35 -0400 Pete Wyckoff <pw at osc.edu> wrote: > pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400: > > robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400: > > > Summary: > > > - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2 > > > patched kernels all seem to have iSER bugs that make them unusable. > > > - as everything works in 2.6.21 presumably this means there's nothing > > > wrong with the iSER implementation in tgtd. well done! :) > > > > Well, that's good and bad news. Nice to know that things do work at times, > > but we have to figure out what happened in the initiator now. Or maybe tgt > > is making some bad assumptions. > > This all turned out to be a known bug in the mthca IB driver in > kernels older than 2.6.21. Including the rhel5 kernel. The > initiator uses FMR for memory registrations, and a certain popular > chipset was prone to random scribbling on old registrations, > yielding wrong data in the application or unexplainable kernel > crashes. Nothing wrong in the target. > > > > with the 2.6.22.6 kernel and iSER I couldn't find any corruption > > > issues using dd to /dev/sdc. however (as reported previously) if I put > > > an ext3 filesystem on the iSER device and then dd to a file in the ext3 > > > filsystem then pretty much immediately I get: > > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1 > > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1 > > > Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1 > > > ... > > > > > > I get the same type of errors with 2.6.23-rc5 too. > > > > I'm still not been able to reproduce this, at least on my > > 2.6.22-rc5. One of these days we'll move to some newer kernels > > here, but have been sort of waiting for the bidi approaches to > > stabilize somewhat. > > Maybe this is fixed. I did find one possible case where the Send > result may have gone out before the final RDMA write, in the case > when the target is starved for RDMA slots. But I never saw the > problem myself, so can't say for sure. > > In fact, I hacked up the bs-sync code to calculate the result > expected by the test application lmdd, rather than read it off disk, > and could achieve your high throughputs but never any corruptions. > It ran all night last night. > > Anyway, there's a new git out there with this one new patch and some > kernel initiator warnings in the README.iser doc. Sounds promising. voltaire guys, any chance to try Pete's latest tree? |