[stgt] [PATCH] sg-based backing store

FUJITA Tomonori fujita.tomonori at lab.ntt.co.jp
Wed Oct 8 08:14:45 CEST 2008


On Tue, 07 Oct 2008 17:54:19 +0200
Alexander Nezhinsky <nezhinsky at gmail.com> wrote:

> FUJITA Tomonori wrote:
> > On Sun, 05 Oct 2008 20:41:53 +0200
> > Alexander Nezhinsky <nezhinsky at gmail.com> wrote:
> > 
> >> This bs provides significant performance improvement when 
> >> working with native scsi devices. In a setup, where 
> >> the scsi devices are exported by a tgt with bs_null,
> >> and both links (from initiator to target and from the 
> >> target to the "backing-store" target) are iSER/IB
> >> sustained bandwidth of 1450 MB/s for READ and
> >> 1350 MB/s for WRITE is achieved. This to be compared to
> >> 700-800 MB/s when running with bs_rdwr in the same setup.
> >> Some improvements are seen with IOPS as well:
> >> 60 kIOPS for READ, 38 kIOPS for WRITE
> >> (compared to 31/35KIOPS with bs_rdwr).
>  
> > I'm not sure what kind of workload you perform, but the performance
> > sounds too good?
> 
> I just do sdp_dd with dio=1. Here is an example of such setup.
> The target exports 3 devices:
> 
> tgtadm --mode target --op show
> ...
>         LUN: 1
>             Type: disk
>             SCSI ID: deadbeaf1:1
>             SCSI SN: beaf11
>             Size: 0 MB
>             Online: Yes
>             Removable media: No
>             Backing store: /dev/sg19
>         LUN: 2
>             Type: disk
>             SCSI ID: deadbeaf1:2
>             SCSI SN: beaf12
>             Size: 1099512 MB
>             Online: Yes
>             Removable media: No
>             Backing store: null_dev1
>         LUN: 3
>             Type: disk
>             SCSI ID: deadbeaf1:3
>             SCSI SN: beaf13
>             Size: 0 MB
>             Online: Yes
>             Removable media: No
>             Backing store: /dev/sg23
> 
> # sg_map -x -i
> ...
> /dev/sg19  3 0 0 12  0  /dev/sdr  DotHill   R/Evo 2730-2R    J200
> ...
> /dev/sg23  31 0 0 1  0  /dev/sdt  IET       VIRTUAL-DISK  0001
> 
> LUN1 is a FC device /dev/sg19, 
> added thru tgtadm with "-E sg --backing-store=/dev/sg23"
> Local READ performace:
> # sgp_dd if=/dev/sg19 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=10 dio=1
> time to transfer data was 5.227954 secs, 410.77 MB/sec
> 
> LUN2 is a bs_null backed device, added thru tgtadm 
> with "-E null --backing-store=null_dev1"
> 
> LUN3 is a device exported by another target, where it is 
> bs_null backed, seen as /dev/sg23,
> added thru tgtadm with "-E sg --backing-store=/dev/sg23"
> Local READ performace:
> # sgp_dd if=/dev/sg23 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=10 dio=1
> time to transfer data was 1.457757 secs, 1473.14 MB/sec
> 
> Initiator sees these devices as:
> /dev/sg22  48 0 0 0  12  IET       Controller  0001
> /dev/sg23  48 0 0 1  0  /dev/sdt  IET       VIRTUAL-DISK  0001
> /dev/sg24  48 0 0 2  0  /dev/sdu  IET       VIRTUAL-DISK  0001
> /dev/sg25  48 0 0 3  0  /dev/sdv  IET       VIRTUAL-DISK  0001
> 
> /dev/sg23 is LUN1 (FC backed thru bs_sg)
> # sgp_dd if=/dev/sg23 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
> time to transfer data was 5.276332 secs, 407.00 MB/sec
> 
> /dev/sg24 is LUN2 (bs_null backed)
> # sgp_dd if=/dev/sg24 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
> time to transfer data was 1.378969 secs, 1557.31 MB/sec
> 
> /dev/sg25 is LUN3 (iSER/IB to another target where it is bs_null backed)
> # sgp_dd if=/dev/sg25 of=/dev/null bs=512 bpt=512 count=4M time=1 thr=4 dio=1
> time to transfer data was 1.475433 secs, 1455.49 MB/sec
> 
> Thus bs_sg in the patch can approach local FC performance 
> within a few MB/s.
> 
> Also, the gap between the pure null device and a null device 
> exported through iSER/IB simulating a "fast" storage is within 
> 100MB/s out of ~1500MB/s.
> 
> Similar relative measurements are obtained for WRITE.

Thanks for the details.


> > This patch means we don't do any caching (like using page cache).
> > 
> > 1. it might lead to poor performance in real environments (not
> > benchmarks).
> 
> This patch is intended for fast storage, where using 
> cache may become a bottleneck instead of a relief. 
> 
> Cache is good when we have a slower network behind. 
> But there are faster networks and faster buses coming, 
> so their speeds become comparable to memory accesses.

Ok,


> Where do you see the benchmark features that may 
> obscure the real-world performance?

My point is that the performance of read-world workload benchmark like
dbench are relevant for the users than the performances of sequential
accesses.

We developers can learn lots from the performances of sequential
accesses and work on the code. But for the users, what matters is that
how good tgtd in their environments. The workloads in their
environments is likely far from simple sequential accesses.


> Do you think that using a simulated device such as a null device 
> exported thru iSER/IB is a pure benchmark? But the target can't know 
> that it is simulated, it is a scsi device, just like all others.
>  
> > 2. DIO and AIO backing store code does similar (avoid threads but
> > don't do any caching). DIO and AIO works for any devices so it might
> > be useful? (though it is slower than this since AIO and DIO is more
> > complicated than sg data transfer)
> 
> Sure, two of the features gained by using SG are async and direct IO,
> which can be obtained with AIO+DIO. One of the problems is a slower 
> access, as you pointed out. Another is a limited support for older 
> distributions. To use bs_sg there is no need to install a newer kernel
> or patch the existing one, for example if using something like 
> RHEL5 with 2.6.18 inside.

ok,

Adding this feature is fine but I think there are still some issues if
this is not just for performance measurements. For example, you need
to take care of WCE. At least, you need to issue SYNCHRONIZE_CACHE
when necessary.


> I think there might be also a place for adding some features that you 
> have termed "passthrough", but this is another issue and i'll write 
> a separate mail on it :)

FYI, 'pass through' is a common SCSI term.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list