I am running tests between two systems: iSCSI server: 4 CPU, 6GB RAM, 6x146GB SAS HDD iSCSI client: 8 CPU, 16GB RAM Both systems are on the same 10GbE subnet. Both run Linux kernel 2.6.18, RHEL 5.3. I am using version 1.0.3 of the scsi target utilities. I downloaded it from stgt.sourceforge.net and built it, because the one that came with the machines didn't support the --control-port option I needed to manage multiple tgtd processes. I tested two different configurations for mounting the 6 drives on the client. First, I used one tgtd process, serving 6 LUNs, each being one of the drives. The client uses iscsiadm to discover this host:port and finds all 6 luns. I now have 6 devices on the client, /dev/sd[a-f]. I spawn 6 simultaneous dd processes on the client, each writing to one of the devices using /dev/zero as input. Looking at the server with sar, I see that each disk is getting about 35MB/sec writes, and not coming close to being saturated (as shown by the %util column of sar -d). The cpu where tgtd is running shows consistently zero idle time, low user time, low I/O wait time, and high system time. I did not try to instrument tgtd to see exactly what it is doing, but my guess is that it is spending most if its time servicing the network. The other three CPUs show very high idle times. The aggregate CPU idle time in the system is about 50%. The second test is the same as the first, except that I now use 6 tgtd processes, one per disk, each listening on a separate port. In this test, each disk is getting about 60MB/sec writes, and all are showing 100% saturation. All four CPUs are now running at nearly 0% idle and the aggregate CPU idle time in the system is about 0%. 60MB/sec is about the maximum I get from these disks when writing to them using local non-networked I/O. It should be noted that the client system spawns just one set of scsi_eq/scsi_wh processes in the first case, and six in the second. I don't know what those processes do, but they never show up in the CPU reports, so they do not seem to be a bottleneck in either test. The CPU usage on the client in both of the tests is very similar, running almost 50% idle, and dominated by the dd and pdflush processes. As I said, I have not instrumented tgtd, but it seems pretty clear that one tgtd process becomes CPU-bound and is unable to do enough local I/O to saturate all of the disks. I'm happy to instrument and re-run if that would help. Regards -- Steve On 2010-04-06 18:16, FUJITA Tomonori wrote: > On Tue, 06 Apr 2010 07:18:46 -0700 > Steven Wertheimer<swerthei at gmail.com> wrote: > > >> Hello. Please pardon this post from someone new to the list if this >> is irrelevant, but I've recently been involved in setting up iscsi on >> a performance testing cluster, and I have observed that a single >> tgtd process does seem to be a bottleneck in a high-throughput >> (10GbE) environment, and that performance improves when I >> use multiple tgtd processes. >> > Can you analyze what processing is exactly the bottleneck? > > > >> If you want the details about this configuration, please let me know. >> > Yeah, please. > -- To unsubscribe from this list: send the line "unsubscribe stgt" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html |