[wpkg-users] TQ: Best practice for RegExp use

Rainer Meier r.meier at wpkg.org
Tue Jul 19 00:09:56 CEST 2011


Hi Stefan,

On 18.07.2011 23:56, Stefan Pendl wrote:
> I have since used the following to match multiple hosts:
>
> name="CAD[0-9]+|CAM[0-9]+"
>
> This matches hosts like CAD09, CAD35, CAM02, CAM15, etc.
>
> Now in a recent message it was suggested to change those patterns into the following, if one uses the "hostname" instead of the
> "name" attribute.
>
> hostname="^(CAD[0-9]+|CAM[0-9]+)$"
 >
> I have already used the hostname attribute to change package variables for individual groups of hosts.
> They use the hostname attribute, but the old pattern and they work so far.
>
> Can anyone with deeper knowledge of WSH RegExp advice how to proceed?
> Might be good to point out the pros and cons, or do and don't.
>
> We could then update the documentation to include the best practice advice.

I don't know if there is some "best practice" for regular expression. It all 
depends on what you would like to match.
However there is one thing you already demonstrated. In host name matches you 
would likely try to match the full host name and not just a part of it.
This can be achieved by prefixing the regex with "^" and ending it with "$".

For example:
^(CAD[0-9]+|CAM[0-9]+)$

will match only if the expression matches the full name. E.g.
CAD000000
CAD00000123
CAD192834324

but does not match
CAD00a
CAD90232348913-2

Using only this expression:
(CAD[0-9]+|CAM[0-9]+)

Would match all the host examples above as it is allowed to match subsets as well.

Note that this is different from regexp matching in the "legacy" name="" 
attribute. Here WPKG would prepend '^' and append '$' automatically. So a name 
of "CAD.*" would first be matched literally (matching only hosts which really 
are named "CAD.*" and then if it does not match WPKG would turn it into 
"^CAD.*$" expression and match it again.

For the new "hostname" attribute this is not true as it is specified from the 
beginning that the value is interpreted as being a regular expression. So you 
need to take care yourself if you want to match the whole string or not. In turn 
you can also specify something like 'AD[0-9]+$" which matches
CAD00
FAD11
something-uglyAD21
whateverAD9

Of course you could achieve the same using full-host match:
^.*AD[0-9]+$

Regexp in general gives you a lot of flexibility but sometimes they are hard to 
read to inexperienced people or if the expression is overly complex.

Well, others might come up with some additional "best practices".

br,
Rainer



More information about the wpkg-users mailing list