erezz at Voltaire.COM wrote on Thu, 06 Mar 2008 12:10 +0200: > Here's a patch that works with the current version of stgt. Thanks for fixing it up; I'll hang onto it for debugging. Hopefully the new sync-range code you added isn't actually getting used in your performance tests. I doubt it. > Now, the performance is even lower (~460 MB/sec with rdwr_sync compared to ~670 MB/sec with rdwr). I've noticed that it takes a lot of time between target_cmd_queue (time = 663673) & iscsi_task_tx_start (669209). 5.5 ms is impossibly slow. Unless you're writing to disk and/or syncing to disk. You need to decide what you want to test. Total throughput needs threading on the target for best performance, and multiple outstanding commands on the initiator. Latency tests are best for understanding which part of the system is being "slow": network, disk, context switch, etc. To test latency effectively you need to ensure that only 1 command is outstanding. > I don't understand something in the behavior of > iscsi_task_tx_start (this may be related to the long time > mentioned above): when it is called, it handles only the 1st task > in conn->tx_clist. This would only matter in the multiple-command case, just to point out the difference again. > Why doesn't it try to handle all tasks on the > list? What happens is that after bs completes is work, it takes a > lot of time until iscsi_task_tx_start is called for that task. That definitely sounds like a problem. So just getting into iscsi_task_tx_start is an issue, even if you only need to be there for a single task. > iscsi_task_tx_start *is* called immediately, but it handles the > 1st task only (so the current task has to wait for this thread to > wake up multiple times until it will be handled). Can anyone > explain this design? After it handles that task, it goes back to the main loop of iser_tx_progress. This function will continue to be called as long as num_tx_ready is non-zero. Various points increment that: conn->tp->event_modify(.., ..|EPOLLOUT) and some completion events from the NIC. This is just like how TCP works. We let the top-level epoll() drive all the events for all the connections. With this added counter so that non-pollable RDMA events can be tracked too. If you narrow down big delays, like the 5.5 ms, to exactly two points, then look at the code and figure out what has to happen to get from one to the other, that will help us figure out what to fix. Like the previous mail where it looked like getting into the RX progress function was slow, indicating something about notifications from the NIC or a bug on that relatively short path. -- Pete |