[stgt] Infinite Loop on 1.0.26?

Brad.Goodman at emc.com Brad.Goodman at emc.com
Fri Apr 13 04:04:44 CEST 2012



> -----Original Message-----
> From: FUJITA Tomonori [mailto:fujita.tomonori at lab.ntt.co.jp]
> Sent: Sunday, April 08, 2012 9:14 PM
> To: Goodman, Brad
> Cc: stgt at vger.kernel.org; alexandern at mellanox.com
> Subject: Re: Infinite Loop on 1.0.26?
> 
> On Sat, 7 Apr 2012 21:53:23 -0400
> <Brad.Goodman at emc.com> wrote:
> 
> >> > I have never seen this type of behavior ever, on prior
> >> > versions. Barring that investigation when/if this happens again -
> I
> >> > just wanted to see if this was a "known" issue, or anyone had ever
> >> > seen anything like this before. Is this new? Any ideas?
> >>
> >> As far as I know, it's new. What the last tgt version worked well
> for
> >> you?
> >
> > We have done a decent amount of testing with versions 1.0.14 and
> 1.0.23
> >
> > Our testing with 1.0.26 has been fairly short (I'm guessing under 10
> minutes actual total run-time).
> >
> > However, in prior versions our testing has been limited to a maximum
> of two initiators, whereas our 1.0.26 has been with a maximum of 8
> initiators. In both cases, again, testing has been specific to iSER.
> 
> Can you perform the same test against the old versions?


I have done some more testing and have not seen this exact bug reproduced, however, I have seen other issues happen with 1.0.26, which I believe may explain what was happening.

When we reproduced this bug with 1.0.26, it appears as though it did not necessarily happen during our intensive data testing - but at some other time around it. I wasn't sure quite when, but it may have been AFTER the testing, possibly associated with other activities, such as adding/removing initiators, etc.

In other testing (though I have not seen this exact bug), I have seen cases where if one were to accidentally use a pre-1.0.26 version of tgtadm to talk to tgtd, it appears as though memory leaks occur. I have seen this manifest itself in several different, reproducible ways. For example, I could create a target, and query which targets exist. The name of the target I just created will show up as garbled. I will then try to add a LUN to the target, and it would say the LUN was "in use", although it didn't even exist, etc.

So, I would conclude that there is a decent chance that I had been using an older version of tgtadm, which may have caused this problem. One of my engineers on the project had told me at one point that there appeared to be differences in [the data structures associated with] tgtadm communication in the newer 1.0.26, and there may be some compatibility issues. Thus, I believe I am seeing just that.

I would possibly advise:

1. That such data structures which would potential change to the point of incompatibility be stamped with some sort of "version number", so that they may be rejected if messages are sent with incompatible versions.

2. Safeguards against the types of (apparent) buffer leaks that may happen when bad, or incompatible data is sent.

Either way - this information, I will still keep a watchful eye for issues, but am willing to lay this issue to rest for now.

Thanks for your time and attention,

Brad Goodman
EMC

--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the stgt mailing list