[Stgt-devel] dd fails with iSER

Alexander Nezhinsky nezhinsky
Sun Aug 12 12:12:32 CEST 2007

> > and tells us to ignore MaxRecvDataSegmentLength.  But it doesn't say
> > how we should figure out the limit for data-type PDUs, i.e. for RDMA
> > transfers, or even if there should be one.  The phrase "control
> > PDUs" means non-RDMA transfers.
> There are no "data-type" PDUs in ISER, that's why no limit for them is
> mentioned. Control type PDUs can carry unsolicited data, but that is true
> only for write ops. As to RDMA ops, the initiator communicates the size of
> transfer and registration key, allowing the target to do the transfer, be it
> read or write, in on or several RDMA ops, as many as it likes.

iSER spec is silent about the granularity of RDMA transfers
because it says nothing about the meaning of MaxBurstLength and
MaxRecvDataSegmentLength, when applied to the solicited data of
a write op, and to the entire data of a read op.

On the other hand, it maps R2T PDUs to RDMA Reads,
and Data-IN to RDMA Writes (changing their meaning), but does not
require that their sizes must be governed by either
MaxBurstLength (for R2Ts) or MaxRecvDataSegmentLength
(for Data-INs).

Thus we can interpret them freely, from the formal point of view.
Moreover, this does not contradict the spirit of the protocol,
which makes all RDMA transfers a target's responsibility.

> > One approach would be to have the target RDMA the entire data
> > segment in a single operation.  This approach minimizes the
> > overhead, but doesn't let us pipeline and may not be possible for
> > large transfer sizes.  The OS won't let us pin all the memory
> > required for the transfer, perhaps.

Another approach is to break both read and write RDMA transfers into
smaller units, allowing internal queuing, pipelining and efficient use
of memory.

This means that the target should set for itself some internal values
of MaxBurstLength and Data-IN's MaxRecvDataSegmentLength.
These values will govern generation of R2Ts and Data-IN and these,
in turn, will initiate a series of RDMA transfers with the desired granularity.

> > Instead I've added another patch that changes the MaxRDSL in the
> > target to be whatever was negotiated for IRDSL.  Since I see no way
> > in the spec how the target could send data in a control type PDU,
> > IRDSL wasn't doing anything for us anyway.  And open-iscsi uses its
> > conn->max_recv_dlength as the starting point for IRDSL, which seems
> > reasonable.

One example of target sending data in a control type PDU is a Response
PDU carrying sense data. Other types are Text-Responses outside Login
phase and some Task mgmt Responses (for higher error levels).
Anyway, the negotiated IRDSL value don't explicitly affect the target.
It just guarantees that the initiator is able to receive our PDUs.

To summarize, the proposed approach uses the following policy:

1. If MaxRDSL declared by the other side is different from the negotiated
value of IRDSL, ignore it.
2. If no MaxRDSL was declared by the initiator, do not declare one
of your own. Otherwise declare the negotiated value of TRDSL.
3. In any case set the internal MRDSL values to those negotiated as

4. When negotiating MaxBurstLength agree with any value proposed by
the initiator (it won't be used anyway). When negotiating IRDSL agree
with any value proposed by the initiator (unless there are some special
considerations regarding the potential control PDUs), and then use it
of course, when applicable.

5. Set some internal value for MaxBurstLength and use it to generate
R2T PDUs effectively splitting the RDMA Read transfers into smaller
portions of  limited size.
6. Introduce a new internal variable DataInMaxDSL that holds the
value of MRDSL to be internally applied to generation of Data-INs,
This effectively splits the RDMA Write transfers into smaller portions
of  limited size.

The internal values of MaxBurstLength  and DataInMaxDSL should try
to achieve good performance, while making reasonable memory
allocation requirements etc. They may be either hardcoded or tunable.

Alexander Nezhinsky

More information about the stgt mailing list