[stgt] stgtd 0.9.3 : Read-Errors using iser transport

Mon Feb 23 08:07:25 CET 2009

Dr. Volker Jaenisch wrote:

> Just for the record. SRP runs without problems on our setup.
> Should we try to disable the caching in SRP? If yes - how do we do that?

Yes, you can patch the srp initiator code to disable fmr pool caching, with this
patch, it also adds a print to see that fmrs are actually used, as under some page list patterns SRP may use "indirect mapping" which is not based on fmr. After seeing that the print does happen, you may want to remove it.

Index: linus-linux-2.6/drivers/infiniband/ulp/srp/ib_srp.c
===================================================================

--- linus-linux-2.6.orig/drivers/infiniband/ulp/srp/ib_srp.c
+++ linus-linux-2.6/drivers/infiniband/ulp/srp/ib_srp.c
@@ -688,7 +688,7 @@ static int srp_map_fmr(struct srp_target
 			       ~dev->fmr_page_mask);
 	buf->key = cpu_to_be32(req->fmr->fmr->rkey);
 	buf->len = cpu_to_be32(len);
-
+	printk(KERN_INFO "srp fmr mapping done, va %llx\n", buf->va);
 	ret = 0;
 
 out:
@@ -2019,7 +2019,7 @@ static void srp_add_one(struct ib_device
 	memset(&fmr_param, 0, sizeof fmr_param);
 	fmr_param.pool_size	    = SRP_FMR_POOL_SIZE;
 	fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
-	fmr_param.cache		    = 1;
+	fmr_param.cache		    = 0;
 	fmr_param.max_pages_per_fmr = SRP_FMR_SIZE;
 	fmr_param.page_shift	    = srp_dev->fmr_page_shift;
 	fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |


> Where may I get a good source of information about how all these things
> work together?

Basically, if you are familiar with the kernel SCSI LLD queuecommand and friends interface to the mid-layer, your delta is more around RDMA programming, specifically for transactional  (e.g request / data / response based) I/O protocols such as SRP, iSER and some file systems. 
the paper of Pete et al @ http://www.osc.edu/~pw/papers/dalessandro-iser-snapi07.pdf provides some info / background but its focused on the target side. As I said FMR is a mechanism to register with the HCA a list of physical pages and produce a <key, va> couple which later used for RDMA initiated by the target. The <key,va> are used by the HCA to issue lookup into its network MMU table and serve this rdma transaction.

> Since the error can be generated by using more than one thread on a
> single core on a system with only
> one physical CPU HT ist ruled out as primary source of the error. Right?

I am not sure to follow.


Or.
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html