[stgt] [PATCH] sg-based backing store

Alexander Nezhinsky nezhinsky at gmail.com
Tue Oct 7 17:54:19 CEST 2008


FUJITA Tomonori wrote:
> On Sun, 05 Oct 2008 20:41:53 +0200
> Alexander Nezhinsky <nezhinsky at gmail.com> wrote:
> 
>> This bs provides significant performance improvement when 
>> working with native scsi devices. In a setup, where 
>> the scsi devices are exported by a tgt with bs_null,
>> and both links (from initiator to target and from the 
>> target to the "backing-store" target) are iSER/IB
>> sustained bandwidth of 1450 MB/s for READ and
>> 1350 MB/s for WRITE is achieved. This to be compared to
>> 700-800 MB/s when running with bs_rdwr in the same setup.
>> Some improvements are seen with IOPS as well:
>> 60 kIOPS for READ, 38 kIOPS for WRITE
>> (compared to 31/35KIOPS with bs_rdwr).
 
> I'm not sure what kind of workload you perform, but the performance
> sounds too good?

I just do sdp_dd with dio=1. Here is an example of such setup.
The target exports 3 devices:

tgtadm --mode target --op show
...
        LUN: 1
            Type: disk
            SCSI ID: deadbeaf1:1
            SCSI SN: beaf11
            Size: 0 MB
            Online: Yes
            Removable media: No
            Backing store: /dev/sg19
        LUN: 2
            Type: disk
            SCSI ID: deadbeaf1:2
            SCSI SN: beaf12
            Size: 1099512 MB
            Online: Yes
            Removable media: No
            Backing store: null_dev1
        LUN: 3
            Type: disk
            SCSI ID: deadbeaf1:3
            SCSI SN: beaf13
            Size: 0 MB
            Online: Yes
            Removable media: No
            Backing store: /dev/sg23

# sg_map -x -i
...
/dev/sg19  3 0 0 12  0  /dev/sdr  DotHill   R/Evo 2730-2R    J200
...
/dev/sg23  31 0 0 1  0  /dev/sdt  IET       VIRTUAL-DISK  0001

LUN1 is a FC device /dev/sg19, 
added thru tgtadm with "-E sg --backing-store=/dev/sg23"
Local READ performace:
# sgp_dd if=/dev/sg19 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=10 dio=1
time to transfer data was 5.227954 secs, 410.77 MB/sec

LUN2 is a bs_null backed device, added thru tgtadm 
with "-E null --backing-store=null_dev1"

LUN3 is a device exported by another target, where it is 
bs_null backed, seen as /dev/sg23,
added thru tgtadm with "-E sg --backing-store=/dev/sg23"
Local READ performace:
# sgp_dd if=/dev/sg23 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=10 dio=1
time to transfer data was 1.457757 secs, 1473.14 MB/sec

Initiator sees these devices as:
/dev/sg22  48 0 0 0  12  IET       Controller  0001
/dev/sg23  48 0 0 1  0  /dev/sdt  IET       VIRTUAL-DISK  0001
/dev/sg24  48 0 0 2  0  /dev/sdu  IET       VIRTUAL-DISK  0001
/dev/sg25  48 0 0 3  0  /dev/sdv  IET       VIRTUAL-DISK  0001

/dev/sg23 is LUN1 (FC backed thru bs_sg)
# sgp_dd if=/dev/sg23 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
time to transfer data was 5.276332 secs, 407.00 MB/sec

/dev/sg24 is LUN2 (bs_null backed)
# sgp_dd if=/dev/sg24 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
time to transfer data was 1.378969 secs, 1557.31 MB/sec

/dev/sg25 is LUN3 (iSER/IB to another target where it is bs_null backed)
# sgp_dd if=/dev/sg25 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
time to transfer data was 1.475433 secs, 1455.49 MB/sec

Thus bs_sg in the patch can approach local FC performance 
within a few MB/s.

Also, the gap between the pure null device and a null device 
exported through iSER/IB simulating a "fast" storage is within 
100MB/s out of ~1500MB/s.

Similar relative measurements are obtained for WRITE.
 
> 
> This patch means we don't do any caching (like using page cache).
> 
> 1. it might lead to poor performance in real environments (not
> benchmarks).

This patch is intended for fast storage, where using 
cache may become a bottleneck instead of a relief. 

Cache is good when we have a slower network behind. 
But there are faster networks and faster buses coming, 
so their speeds become comparable to memory accesses.

Where do you see the benchmark features that may 
obscure the real-world performance?

Do you think that using a simulated device such as a null device 
exported thru iSER/IB is a pure benchmark? But the target can't know 
that it is simulated, it is a scsi device, just like all others.
 
> 2. DIO and AIO backing store code does similar (avoid threads but
> don't do any caching). DIO and AIO works for any devices so it might
> be useful? (though it is slower than this since AIO and DIO is more
> complicated than sg data transfer)

Sure, two of the features gained by using SG are async and direct IO,
which can be obtained with AIO+DIO. One of the problems is a slower 
access, as you pointed out. Another is a limited support for older 
distributions. To use bs_sg there is no need to install a newer kernel
or patch the existing one, for example if using something like 
RHEL5 with 2.6.18 inside.

I think there might be also a place for adding some features that you 
have termed "passthrough", but this is another issue and i'll write 
a separate mail on it :)

Alexander Nezhinsky 

--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list