[Stgt-devel] iSER
Pete Wyckoff
pw
Sun Sep 9 20:12:42 CEST 2007
robin.humble+stgt at anu.edu.au wrote on Sun, 09 Sep 2007 11:30 -0400:
> Summary:
> - 2.6.21 seems to be a good kernel. 2.6.22 or newer, or RedHat's OFED 1.2
> patched kernels all seem to have iSER bugs that make them unusable.
> - as everything works in 2.6.21 presumably this means there's nothing
> wrong with the iSER implementation in tgtd. well done! :)
Well, that's good and bad news. Nice to know that things do work at times,
but we have to figure out what happened in the initiator now. Or maybe tgt
is making some bad assumptions.
> with the 2.6.22.6 kernel and iSER I couldn't find any corruption
> issues using dd to /dev/sdc. however (as reported previously) if I put
> an ext3 filesystem on the iSER device and then dd to a file in the ext3
> filsystem then pretty much immediately I get:
> Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196611, length 1
> Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196612, length 1
> Sep 9 21:46:22 x11 kernel: EXT3-fs error (device sdc): ext3_new_block: Allocating block in system zone - blocks from 196613, length 1
> ...
>
> I get the same type of errors with 2.6.23-rc5 too.
I'm still not been able to reproduce this, at least on my
2.6.22-rc5. One of these days we'll move to some newer kernels
here, but have been sort of waiting for the bidi approaches to
stabilize somewhat.
The only issue I've found is a slight race condition when the
initiator unexpectly hangs up. The target would exit if it saw
a work request flush before seeing the CM disconnect event. Added
a new patch to the git to fix this. But it doesn't explain your
corruption issues.
> with 2.6.21 (mem=512M) on the initiator side and 2.6.21 or 2.6.22.6
> (7.1g ramdisk as backing store) then everything seems to work fine.
> eg. bonnie++
>
> Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> x11 512M 80329 99 521771 99 224506 44 85983 95 525440 49 +++++ +++
> x11 1G 80649 99 484939 92 207655 43 59377 98 488031 41 13703 14
> x11 2G 79976 99 461833 94 208618 42 74189 97 467245 39 10060 13
> x11 4G 79873 99 487361 97 210199 43 87312 98 484341 42 8459 13
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> 64 80318 99 +++++ +++ 86949 99 80277 99 +++++ +++ 83630 100
> 256 68904 97 436942 98 61886 83 67777 95 +++++ +++ 48291 69
> 512 40226 62 34164 25 37500 65 44426 67 22325 18 28473 53
You're getting neighborhood of 500 MB/s for block reads _and_ writes
through ext3. This is different from your earlier results with dd:
robin.humble+stgt at anu.edu.au wrote on Wed, 05 Sep 2007 10:46 -0400:
> bypassing the page cache (and readahead?) with O_DIRECT:
> eg. dd if=/dev/zero of=/dev/sdc bs=1k count=8000 oflag=direct
> bs write MB/s read MB/s
> 10M 1200 520
> 1M 790 460
> 200k 480 350
> 4k 40 34
> 1k 11 9
> large writes look fabulous, but reads seem to be limited by something
> other than IB bandwidth.
>
> in the more usual usage case via the page cache:
> eg. dd if=/dev/zero of=/dev/sdc bs=1k count=8000000
> bs write MB/s read MB/s
> 10M 1100 260
> 1M 1100 270
> 4k 960 270
> 1k 30 240
> so maybe extra copies to/from page cache are getting in the way of the
> read bandwidth and are lowering it by a factor of 2.
> I'm guessing the good small block read performance here is due to
> readahead, and the mostly better writes are from aggregation.
We see similar behavior as your earlier tests on dd, where reads are
too slow, and have started trying to figure out what's taking so
long in the code. One would expect reads to be the fast case, as
they map to RDMA write operations.
Thanks for doing all this testing.
-- Pete
More information about the stgt
mailing list