[Stgt-devel] Tuning iSER for performance

Erez Zilber erezz
Mon Mar 10 14:20:17 CET 2008


Pete Wyckoff wrote:
> erezz at Voltaire.COM wrote on Sun, 09 Mar 2008 16:06 +0200:
>   
>>> That's the only way to get multi-operation concurrency on the
>>> target given existing linux userspace AIO infrastructure:  run
>>> multiple IOs at the same time, in separate threads.
>>>
>>> The patch I sent you is just so you can debug, to answer the
>>> question:  Is the context switch the source of your performance
>>> problem on a _single_ request.
>>>
>>>   
>>>       
>> I guess that I missed your point about a single request. Anyway, here's
>> what happens when I use bs_rdwr_sync and run the following command:
>>
>> sgp_dd if=/dev/sdd of=/dev/null bs=512 bpt=1024 count=256 thr=8 time=1
>>
>> (single 128K READ command)
>>     
>
> I thought thr=8 meant multiple commands.  But the trace snippet
> below looks like just one, so fine.
>   

Yeah, I'm used to type thr=8 automatically :-) . Anyway, since count <
bpt, it will be a single command.
>   
>> tgtd: [14:33:51.382528] iser_rx_progress(1102) entry
>> tgtd: [14:33:51.382567] handle_wc(897) incoming cmd, len 76
>> tgtd: [14:33:51.382578] iscsi_iser_read(1302) buf 0x52a628 nbytes 48
>> tgtd: [14:33:51.382590] iser_parse_hdr(1266) control type PDU
>> tgtd: [14:33:51.382599] iser_parse_hdr(1272) rstag 4aa2003d va 37b0c000
>> tgtd: [14:33:51.382609] iscsi_scsi_cmd_rx_start(1466) 1 28 0 0 131072 1 17
>> tgtd: [14:33:51.382622] iscsi_rdma_alloc_data_buf(1660) malloc
>> 0x2b13ddc70000 sz 131072
>> tgtd: [14:33:51.382633] iscsi_task_queue(1411) 17 17 1
>> tgtd: [14:33:51.382644] target_cmd_queue(763) 0x536488 28 1
>>
>> Here, we submit the cmd to bs (I'm using a RAM disk):
>> tgtd: [14:33:51.382655] target_cmd_queue(783) 0x536488 28 1 1
>>
>> Here, it returns:
>> tgtd: [14:33:51.382984] bs_rdwr_sync_cmd_submit(129) io done 0x536488 28
>> 131072 131072
>>
>> 329 us looks like too much time. On another measurement, I got 276 us
>> (which is also not very good). Even if I measure the time that it takes
>> just to run pread64 (called from bs_rdwr_sync_cmd_submit), I get 273 us.
>>
>> Do you get different numbers on your system?
>>     
>
> Agreed, that's rather slow, 480 MB/s.  Something else is going on.
> Closest number I can lay my hands on says 350 kB was 94 us in the
> pread, 3800 MB/s.

What's your setup? I'm using a RAM disk that I found here:

http://marc.info/?l=linux-scsi&m=120331663227540&w=2

>   You should be measuring memory copy speed here.
>   

Do you mean that memory copy is 480 MB/sec? That's slow.

>   
>> Another question is - how does pread64 access the SCSI device? I
>> understand that it reads from /dev/sdX. Does it call sd? How? Is there
>> any memory copy involved? I'm asking that because I'm used to kernel
>> space where we just call scsi_do_req.
>>     
>
> It reads from wherever it put your device with ./tgtadm ...
> --backing-store ... .  Presumably a file on the file system, or a
> raw block device like /dev/sdb.

Of course. The question is - what is the interface between pread and
scsi-ml? That's what I still don't understand.

>   Either way, after the first read,
> the data stays in page cache, and pread is just a memcpy.
>
> You should watch if you're hitting the disk (vmstat 1) while doing
> some runs.

No, it doesn't hit the disk.

>   You might try strace -T to see what the target is doing
> with all its time.  If it's really just the pread or something we
> have overlooked.
>   

strace looks like that:

epoll_wait(3, {}, 1024, 2000) = 0 <2.000066>
epoll_wait(3, {}, 1024, 2000) = 0 <2.000068>
epoll_wait(3, {}, 1024, 2000) = 0 <2.000065>
epoll_wait(3, {}, 1024, 2000) = 0 <2.000065>
epoll_wait(3, {{EPOLLIN, {u32=5394880, u64=5394880}}}, 1024, 2000) = 1
<1.184798>
read(10, "\320\235R\0\0\0\0\0", 8) = 8 <0.000009>
pread(11, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
131072, 0) = 131072 <0.000336>
epoll_wait(3, {{EPOLLIN, {u32=5394880, u64=5394880}}}, 1024, 2000) = 1
<0.000088>
read(10, "\320\235R\0\0\0\0\0", 8) = 8 <0.000024>
epoll_wait(3, {{EPOLLIN, {u32=5394880, u64=5394880}}}, 1024, 2000) = 1
<0.000023>
read(10, "\320\235R\0\0\0\0\0", 8) = 8 <0.000024>
epoll_wait(3, {}, 1024, 2000) = 0 <1.998508>
epoll_wait(3, {}, 1024, 2000) = 0 <1.999959>

>   
>> tgtd: [14:33:51.383085] iscsi_rdma_rdma_write(1492) offset 0 len 131072
>> stag 4aa2003d va 37b0c000
>>
>> Here, we start RDMA:
>> tgtd: [14:33:51.383118] iscsi_rdma_event_modify(1634) tx ready removing
>> 0x52a510
>> tgtd: [14:33:51.383129] iscsi_scsi_cmd_tx_done(1707) more data or sense
>> or bidir 17
>>
>> Here, the notification from IB is received:
>> tgtd: [14:33:51.383227] iser_rx_progress(1102) entry
>>
>> It takes 109 us, which looks ok. On another measurement, I got 108 us.
>>     
>
> I'd measure from .383085 to .383227, getting 923 MB/s.  Reasonable
> network speed for SDR.
>
> 		-- Pete
>   



More information about the stgt mailing list