> > and tells us to ignore MaxRecvDataSegmentLength. But it doesn't say > > how we should figure out the limit for data-type PDUs, i.e. for RDMA > > transfers, or even if there should be one. The phrase "control > > PDUs" means non-RDMA transfers. > > There are no "data-type" PDUs in ISER, that's why no limit for them is > mentioned. Control type PDUs can carry unsolicited data, but that is true > only for write ops. As to RDMA ops, the initiator communicates the size of > transfer and registration key, allowing the target to do the transfer, be it > read or write, in on or several RDMA ops, as many as it likes. iSER spec is silent about the granularity of RDMA transfers because it says nothing about the meaning of MaxBurstLength and MaxRecvDataSegmentLength, when applied to the solicited data of a write op, and to the entire data of a read op. On the other hand, it maps R2T PDUs to RDMA Reads, and Data-IN to RDMA Writes (changing their meaning), but does not require that their sizes must be governed by either MaxBurstLength (for R2Ts) or MaxRecvDataSegmentLength (for Data-INs). Thus we can interpret them freely, from the formal point of view. Moreover, this does not contradict the spirit of the protocol, which makes all RDMA transfers a target's responsibility. > > One approach would be to have the target RDMA the entire data > > segment in a single operation. This approach minimizes the > > overhead, but doesn't let us pipeline and may not be possible for > > large transfer sizes. The OS won't let us pin all the memory > > required for the transfer, perhaps. Another approach is to break both read and write RDMA transfers into smaller units, allowing internal queuing, pipelining and efficient use of memory. This means that the target should set for itself some internal values of MaxBurstLength and Data-IN's MaxRecvDataSegmentLength. These values will govern generation of R2Ts and Data-IN and these, in turn, will initiate a series of RDMA transfers with the desired granularity. > > Instead I've added another patch that changes the MaxRDSL in the > > target to be whatever was negotiated for IRDSL. Since I see no way > > in the spec how the target could send data in a control type PDU, > > IRDSL wasn't doing anything for us anyway. And open-iscsi uses its > > conn->max_recv_dlength as the starting point for IRDSL, which seems > > reasonable. One example of target sending data in a control type PDU is a Response PDU carrying sense data. Other types are Text-Responses outside Login phase and some Task mgmt Responses (for higher error levels). Anyway, the negotiated IRDSL value don't explicitly affect the target. It just guarantees that the initiator is able to receive our PDUs. To summarize, the proposed approach uses the following policy: 1. If MaxRDSL declared by the other side is different from the negotiated value of IRDSL, ignore it. 2. If no MaxRDSL was declared by the initiator, do not declare one of your own. Otherwise declare the negotiated value of TRDSL. 3. In any case set the internal MRDSL values to those negotiated as IRDSL,ORDSL. 4. When negotiating MaxBurstLength agree with any value proposed by the initiator (it won't be used anyway). When negotiating IRDSL agree with any value proposed by the initiator (unless there are some special considerations regarding the potential control PDUs), and then use it of course, when applicable. 5. Set some internal value for MaxBurstLength and use it to generate R2T PDUs effectively splitting the RDMA Read transfers into smaller portions of limited size. 6. Introduce a new internal variable DataInMaxDSL that holds the value of MRDSL to be internally applied to generation of Data-INs, This effectively splits the RDMA Write transfers into smaller portions of limited size. The internal values of MaxBurstLength and DataInMaxDSL should try to achieve good performance, while making reasonable memory allocation requirements etc. They may be either hardcoded or tunable. Alexander Nezhinsky |