[Stgt-devel] iSER

Erez Zilber erezz
Mon Oct 8 17:36:16 CEST 2007


FUJITA Tomonori wrote:
> On Thu, 4 Oct 2007 13:20:35 -0400
> Pete Wyckoff <pw at osc.edu> wrote:
>
>   
>> pw at osc.edu wrote on Sun, 09 Sep 2007 14:12 -0400:
>>     
>>> robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
>>>       
>>>> Summary:
>>>>  - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
>>>>    patched kernels all seem to have iSER bugs that make them unusable.
>>>>  - as everything works in 2.6.21 presumably this means there's nothing
>>>>    wrong with the iSER implementation in tgtd. well done! :)
>>>>         
>>> Well, that's good and bad news.  Nice to know that things do work at times,
>>> but we have to figure out what happened in the initiator now.  Or maybe tgt
>>> is making some bad assumptions.
>>>       
>> This all turned out to be a known bug in the mthca IB driver in
>> kernels older than 2.6.21.  Including the rhel5 kernel.  The
>> initiator uses FMR for memory registrations, and a certain popular
>> chipset was prone to random scribbling on old registrations,
>> yielding wrong data in the application or unexplainable kernel
>> crashes.  Nothing wrong in the target.
>>
>>     
>>>> with the 2.6.22.6 kernel and iSER I couldn't find any corruption
>>>> issues using dd to /dev/sdc. however (as reported previously) if I put
>>>> an ext3 filesystem on the iSER device and then dd to a file in the ext3
>>>> filsystem then pretty much immediately I get:
>>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
>>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
>>>>   Sep  9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
>>>>   ...
>>>>
>>>> I get the same type of errors with 2.6.23-rc5 too.
>>>>         
>>> I'm still not been able to reproduce this, at least on my
>>> 2.6.22-rc5.  One of these days we'll move to some newer kernels
>>> here, but have been sort of waiting for the bidi approaches to
>>> stabilize somewhat.
>>>       
>> Maybe this is fixed.  I did find one possible case where the Send
>> result may have gone out before the final RDMA write, in the case
>> when the target is starved for RDMA slots.  But I never saw the
>> problem myself, so can't say for sure.
>>
>> In fact, I hacked up the bs-sync code to calculate the result
>> expected by the test application lmdd, rather than read it off disk,
>> and could achieve your high throughputs but never any corruptions.
>> It ran all night last night.
>>
>> Anyway, there's a new git out there with this one new patch and some
>> kernel initiator warnings in the README.iser doc.
>>     
>
> Sounds promising. voltaire guys, any chance to try Pete's latest tree?

We ran some tests on it and it looks ok now (still trying to make it 
crash :-) ). We will run more nasty tests soon, and if anything goes 
wrong, we will report. We will also try to get some performance numbers 
(BW, iops) from our storage.

Thanks to Pete for donating the iSER code and fixing bugs.

Erez



More information about the stgt mailing list