[stgt] stgtd 0.9.3 : Read-Errors using iser transport
robin.humble+stgt at anu.edu.au
Sun Feb 22 14:41:37 CET 2009
On Sun, Feb 22, 2009 at 02:53:00PM +0200, Or Gerlitz wrote:
> Dr. Volker Jaenisch wrote:
>> every combination that I've tried when there are multiple
>> simultaneous readers Reproduced that. On a single core more than one
>> simultanteous threads accessing the LUN over iSER also give read
> OK, Thanks a lot for doing all this testing / bug hunting work.
> I read the Feb 2008 "iser multiple readers" thread and wasn't sure if /
> what was the conclusion.
just to chime in here - I don't think there was any conclusion from 12
months ago... as I was the only one seeing problems at that time, it
(quite rightly) couldn't be ruled out that there was something odd with
now that other people are seeing problems too, the chances that the
problem is real and of finding a fix are better.
however, it turns out that we don't need iSER in production any time
soon, so I haven't been spending any time on it. but let me know if you
want me to test a fix and I'll try to find time to break it :-)
>OTOH Robin reported that the patch that slows
> down tgt not to send the scsi response before the rdma write is
> completed eliminated the error but OTOH Pete was doing some analysis of
> the errors, @
>> "The offsets are always positive, which fits in with the theory that
>> future RDMAs are overwriting earlier ones. This goes against the
>> theory in your (my) patch, which guesses that the SCSI responsemessage
>> is sneaking ahead of RDMA operations."
> and here starts the talking on possible relations of this error with
> FMRs, where Pete suggested to disable FMRs and see if the problem
> persists, I wasn't sure if you did that.
>> My guess is that the AMD hyper-transport may interfere with the fmr.
>> But I am no linux memory management specialist .. so please correct me
>> if I am wrong. Maybe the following happens: Bootet with one CPU all
>> FMR request goes to the 16GB RAM this single CPU directly addresses
>> via its memory controller. In case of more than one active CPU the
>> memory is fetched from both CPUs memory controllers with preference
>> to local memory. In seldom cases the memory manager fetchs memory for
>> the FMR process running on CPU0 from the CPU1 via the hyper-transport
>> channel and something weird happens.
> To make sure we are on the same page (...) here: FMR (Fast Memory
> Registration) is a means to register with the HCA a (say) arbitrary list
> of pages to be used for an I/O. This page SC (scatter-gather) list was
> allocated and provided by the SCSI midlayer to the iSER SCSI LLD
> (low-level-driver) through the queuecommand interface. So I read your
> comment as saying that when using one CPU and or a system with one
> memory controller all I/O are served with pages from the "same memory"
> where when this doesn't happen, something gets broken.
> I wasn't sure to follow on the sentence "In seldom cases the memory
> manager fetchs memory for the FMR process running on CPU0 from the CPU1
> via the hyper-transport channel and something weird happens" - can you
> explain a bit what you were referring to?
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the stgt