[Stgt-devel] Tuning iSER for performance
Erez Zilber
erezz
Sun Mar 2 14:17:16 CET 2008
Pete Wyckoff wrote:
> erezz at Voltaire.COM wrote on Sun, 10 Feb 2008 14:54 +0200:
>
>> I'm trying to run some performance tests on iSER. I'm using the latest
>> stgt build (tgt-20071227) against open-iscsi r865.15. When I run sgp_dd
>> on a single LUN directly from the target machine, I get ~400 MB/sec
>> (which is ok because it's the max throughput of that device). When I run
>> it from the initiator, I get only ~240 MB/sec.
>>
>> Did you use any specific configuration when you did your measurements?
>> Pete - according to your numbers, you got wire speed. I wonder why do I
>> get 40% less than the physical storage throughput. BTW - when I use
>> another iSCSI over iSER target, I get 400 MB/sec from the initiator.
>>
>
> Here's a patch that will log timestamps. I use these to pinpoint
> where the time is going. First check for anomalies: plot it up
> with line number on the x axis, time stamp on y, make sure it is
> smooth.
It is almost smooth. From time to time, you can see that it takes time
before iser_rx_progress is called. I guess that the cq is empty for a
short time. I don't think that this is the real problem.
> Then pick one operation: zoom in with the plot and your
> editor to look at the times for a single read/write, from when the
> request first appears to when the response goes out.
>
Here's an example of a single command execution:
27892 tgtd: [16:40:38.907547] iser_rx_progress(1101) entry
27893 tgtd: [16:40:38.907558] handle_wc(897) incoming cmd, len 76
27894 tgtd: [16:40:38.907568] iscsi_iser_read(1301) buf 0x529fc8 nbytes 48
27895 tgtd: [16:40:38.907578] iser_parse_hdr(1265) control type PDU
27896 tgtd: [16:40:38.907587] iser_parse_hdr(1271) rstag 9b320086 va
3f1fc000
27897 tgtd: [16:40:38.907597] iscsi_scsi_cmd_rx_start(1466) 1 28 0 0
524288 1 b
27898 tgtd: [16:40:38.907607] iscsi_rdma_alloc_data_buf(1659) malloc
0x2acf61554000 sz 524288
27899 tgtd: [16:40:38.907617] iscsi_task_queue(1411) 388b 388b 1
27900 tgtd: [16:40:38.907627] target_cmd_queue(763) 0x535de8 28 1
27901 tgtd: [16:40:38.907637] target_cmd_queue(783) 0x535de8 28 1 1
this is where bs work starts:
27902 tgtd: [16:40:38.907649] tgtd: [16:40:38.907650]
bs_thread_worker_fn(151) got 0x535de8
27903 target_cmd_queue(794) b 28 (nil) 0x2acf61554000 262144 0 524288 0 4
27904 tgtd: [16:40:38.907676] iscsi_rdma_event_modify(1628) tx ready
adding 0x529eb0
27905 tgtd: [16:40:38.907686] handle_wc(922) incoming cmd proc done, repost
27906 tgtd: [16:40:38.907697] iser_tx_progress(1017) entry
27907 tgtd: [16:40:38.907706] iser_tx_progress(1028) trying tx
27908 tgtd: [16:40:38.907716] iscsi_task_tx_start(1784) no more data
27909 tgtd: [16:40:38.907725] iscsi_rdma_event_modify(1633) tx ready
removing 0x529eb0
this is where bs work ends (130 us):
27910 tgtd: [16:40:38.907779] bs_rdwr_request(114) io done 0x535de8 28
524288 524288
27911 tgtd: [16:40:38.907796] bs_thread_ack_fn(82) found 0x535de8
27912 tgtd: [16:40:38.907813] bs_thread_request_done(122) back to tgtd,
0x535de8
27913 tgtd: [16:40:38.907825] iscsi_rdma_event_modify(1628) tx ready
adding 0x529eb0
27914 tgtd: [16:40:38.907837] iser_tx_progress(1017) entry
27915 tgtd: [16:40:38.907846] iser_tx_progress(1028) trying tx
27916 tgtd: [16:40:38.907856] iscsi_task_tx_start(1759) found a task b
524288 0 0
27917 tgtd: [16:40:38.907865] iscsi_data_rsp_build(974) 524288 524288
524288 524288 b
27918 tgtd: [16:40:38.907875] iscsi_rdma_rdma_write(1491) offset 0 len
524288 stag 9b320086 va 3f1fc000
27919 tgtd: [16:40:38.907885] iscsi_rdma_event_modify(1633) tx ready
removing 0x529eb0
27920 tgtd: [16:40:38.907895] iscsi_scsi_cmd_tx_done(1707) more data or
sense or bidir b
27921 tgtd: [16:40:38.908269] iser_rx_progress(1101) entry
27922 tgtd: [16:40:38.908280] handle_wc(932) RDMA write done
27923 tgtd: [16:40:38.908289] iscsi_rdma_event_modify(1628) tx ready
adding 0x529eb0
27924 tgtd: [16:40:38.908298] iser_tx_progress(1017) entry
27925 tgtd: [16:40:38.908308] iser_tx_progress(1028) trying tx
27926 tgtd: [16:40:38.908317] iscsi_task_tx_start(1759) found a task b
524288 524288 0
27927 tgtd: [16:40:38.908327] iscsi_cmd_rsp_build(901) 0x535d28 28
27928 tgtd: [16:40:38.908336] iscsi_iser_write_begin(1352) new sendl
0x2aaab45f7090 len 48
27929 tgtd: [16:40:38.908347] iscsi_iser_write_end(1387) sendl
0x2aaab45f7090 len 76
27930 tgtd: [16:40:38.908356] iscsi_iser_write_end(1402) inc progress to
finish cmd
27931 tgtd: [16:40:38.908366] __cmd_done(855) 8 (nil) 0x2acf61554000 0
524288 0
27932 tgtd: [16:40:38.908376] iscsi_rdma_free_data_buf(1678) free
0x2acf61554000
It seems that bs work takes a lot of time. For example: if I run
disktest on the same device directly from the target machine (i.e. tgt
is not involved) in the following way:
disktest -PT -T100 -h1 -K64 -B256k -ID /dev/sdc -r
I get ~13000 iops -> each IO takes ~77 us (compared to 130 us only for
bs work with tgt). I'm not familiar with the bs code in tgt, so I'm not
sure what should be done here.
Another thing that looks strange - from looking at the logs, I don't see
that work is done in parallel. A command is received, iSCSI processing
is done and it is sent to bs processing (which seems to be idle until
that point in time). Now, nothing happens until bs processing is
complete. I will try to look more into the code and try to understand
it, but can someone try to elaborate on this?
> I suspect you'll find lots of the time to be in bs_rdwr servicing
> the IO. In which case you'll want to investigate making that
> faster: more readahead on reads, bigger write caching. I've never
> tried to use real devices though.
>
> Will be interested to see what kind of bottlenecks you find.
>
> -- Pete
>
> commit f94c0548ea22223ccf3bf66558598336b3363cfc
> Author: Pete Wyckoff <pw at osc.edu>
> Date: Tue, 16 Oct 2007 15:21:33 -0400
>
> local logging changes
>
> Do not use syslog for error messages, just send them to stdout/stderr where
> they can be redirected. Add NDEBUG ifdef around log_debug to avoid
> compiling those lines for optimized performance, and fix a compiler warning
> that falls out due to this. Put timestamps on messages for performance
> debugging.
> ---
> usr/log.c | 19 +++++++++++++++----
> usr/log.h | 6 ++++++
> usr/spc.c | 2 ++
> 3 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/usr/log.c b/usr/log.c
> index 4a30f05..cb601f6 100644
> --- a/usr/log.c
> +++ b/usr/log.c
> @@ -24,9 +24,11 @@
> #include <unistd.h>
> #include <syslog.h>
> #include <signal.h>
> +#include <time.h>
> #include <sys/shm.h>
> #include <sys/ipc.h>
> #include <sys/types.h>
> +#include <sys/time.h>
>
> #include "log.h"
>
> @@ -235,7 +237,7 @@ static void log_syslog (void * buff)
>
> static void dolog(int prio, const char *fmt, va_list ap)
> {
> - if (is_daemon) {
> + if (0 && is_daemon) {
> la->ops[0].sem_op = -1;
> if (semop(la->semid, la->ops, 1) < 0) {
> syslog(LOG_ERR, "semop up failed");
> @@ -250,7 +252,16 @@ static void dolog(int prio, const char *fmt, va_list ap)
> return;
> }
> } else {
> - fprintf(stderr, "%s: ", log_name);
> + struct timeval tv;
> + time_t tp;
> + char buffer[16];
> +
> + gettimeofday(&tv, NULL);
> + tp = tv.tv_sec;
> + strftime(buffer, 9, "%H:%M:%S", localtime(&tp));
> + sprintf(buffer+8, ".%06ld", (long) tv.tv_usec);
> +
> + fprintf(stderr, "%s: [%s] ", log_name, buffer);
> vfprintf(stderr, fmt, ap);
> fflush(stderr);
> }
> @@ -308,7 +319,7 @@ int log_init(char *program_name, int size, int daemon, int debug)
>
> logdbg(stderr,"enter log_init\n");
> log_name = program_name;
> - if (is_daemon) {
> + if (0 && is_daemon) {
> struct sigaction sa_old;
> struct sigaction sa_new;
> pid_t pid;
> @@ -349,7 +360,7 @@ int log_init(char *program_name, int size, int daemon, int debug)
>
> void log_close (void)
> {
> - if (is_daemon) {
> + if (0 && is_daemon) {
> closelog();
> free_logarea();
> }
> diff --git a/usr/log.h b/usr/log.h
> index 35ce374..40eb739 100644
> --- a/usr/log.h
> +++ b/usr/log.h
> @@ -76,9 +76,15 @@ do { \
> log_error("%s(%d) " fmt, __FUNCTION__, __LINE__, ##args); \
> } while (0)
>
> +#ifndef NDEBUG
> #define dprintf(fmt, args...) \
> do { \
> log_debug("%s(%d) " fmt, __FUNCTION__, __LINE__, ##args); \
> } while (0)
> +#else
> +#define dprintf(fmt, args...) \
> +do { \
> +} while (0)
> +#endif
>
> #endif /* LOG_H */
> diff --git a/usr/spc.c b/usr/spc.c
> index 7802f76..bdba471 100644
> --- a/usr/spc.c
> +++ b/usr/spc.c
> @@ -533,6 +533,7 @@ exit:
>
> void dump_cdb(struct scsi_cmd *cmd)
> {
> +#ifndef NDEBUG
> uint8_t *cdb = cmd->scb;
>
> switch(cmd->scb_len) {
> @@ -561,6 +562,7 @@ void dump_cdb(struct scsi_cmd *cmd)
> cdb[12], cdb[13], cdb[14], cdb[15]);
> break;
> }
> +#endif
> }
>
> int spc_illegal_op(int host_no, struct scsi_cmd *cmd)
>
<http://www.voltaire.com/>
More information about the stgt
mailing list