Discussion:
Easy (I'm sure) regexp question
(too old to reply)
daneyul
2006-11-29 16:35:13 UTC
Permalink
Examples on regexp seem to be always missing the case of an asterisk at
the front of the expression. How is that handled? Here's what I get:

% set x *123
*123

% regexp $x 123
couldn't compile regular expression pattern: quantifier operand invalid

So...I tried preceding the variable with the asterisk with a period
(matches any single char)

% regexp .$x 123
1

That seems to work ok, but if I then make a variable without the
asterisk...

% set x 123
123

The preceding period no longer works...

% regexp .$x 123
0

I know this is probably simple, but all examples seem to be missing the
proper procedure here and I'm a regexp novice. How would I accomplish
the above (ie, formulate an regexp that would handle a variable with or
without an asterisk wild card anywhere within it)?

-Daniel
suchenwi
2006-11-29 16:43:02 UTC
Permalink
Post by daneyul
Examples on regexp seem to be always missing the case of an asterisk at
% set x *123
*123
% regexp $x 123
couldn't compile regular expression pattern: quantifier operand invalid
* is special in re_syntax, as it quantifies the preceding element as
"zero or more". For a literal asterisk,
- escape it with backslash {\*123}, or
- put it in class brackets {[*]123}

In both cases, brace the RE to prevent that Tcl's parser interprets \
or [] in its own way. re_syntax is a quite different "little language".
Glenn Jackman
2006-11-29 16:58:24 UTC
Permalink
Post by daneyul
Examples on regexp seem to be always missing the case of an asterisk at
% set x *123
*123
% regexp $x 123
couldn't compile regular expression pattern: quantifier operand invalid
So...I tried preceding the variable with the asterisk with a period
(matches any single char)
% regexp .$x 123
1
That seems to work ok, but if I then make a variable without the
asterisk...
% set x 123
123
The preceding period no longer works...
% regexp .$x 123
0
I know this is probably simple, but all examples seem to be missing the
proper procedure here and I'm a regexp novice. How would I accomplish
the above (ie, formulate an regexp that would handle a variable with or
without an asterisk wild card anywhere within it)?
If your input is using an asterisk as a wildcard, then you're not using
regular expressions, you're using glob patterns. Use [string match]
instead:

set pattern *123 ;# match any string ending in "123"
string match $pattern 123 ;# true
string match $pattern abc123 ;# true
string match $pattern 1234 ;# false

set pattern *123* ;# match any string containing "123"
string match $pattern abc123def ;# true

If you *must* use regular expressions, you have to filter the pattern
first:
set input *123?456
set pattern [string map {* .* ? . \\ \\\\ + \\+ \{ \\\{ } $input]
# I may be missing some stuff there.
# there's probably something more robust on the wiki

regexp $pattern $string

Does Tcl have something like Perl's quotemeta() ?
--
Glenn Jackman
Ulterior Designer
Larry W. Virden
2006-11-29 17:02:25 UTC
Permalink
Post by daneyul
Examples on regexp seem to be always missing the case of an asterisk at
Can you describe what you actually are wanting to match? I'm going to
describe something, but I don't know if I am matching your expectation.

Remember that regular expressions are different than shell level
globbing, even though there is some vague similarities.

A pattern of {*123} isn't, by itself, a valid regular expression. There
are two alternatives that would be valid:

{\*123}

meaning "match a literal asterisk followed by the 3 numeric digits 1,
2, and 3" or

{.*123}

meaning "match a string which has 0 or more characters of any time,
followed by the 3 numeric digits 1, 2, and 3".

If neither of these are what you were trying to achieve, please
describe to us what you were wanting in more detail.
Post by daneyul
% set x *123
*123
% regexp $x 123
couldn't compile regular expression pattern: quantifier operand invalid
the * as well as the + and {n,m} are metacharacters in regular
expressions. Their role is to indicate "how many" occurances of the
previous regular expression atom should constitute a match.

Thus the expression:

a*

means zero or more repetitions of the character a and

[^aeiou]+

means one or more characters which are not english vowels.

Saying {*123} leaves the regular expression parser in a quandry,
because you've said you want 0 or more occurances of some atom, but
haven't defined the atom.
Post by daneyul
So...I tried preceding the variable with the asterisk with a period
(matches any single char)
% regexp .$x 123
1
That seems to work ok, but if I then make a variable without the
asterisk...
% set x 123
123
The preceding period no longer works...
% regexp .$x 123
0
Ah - in the first case, you have, essentially, typed this:

regexp .*123 123

which says "match zero or more characters followed by 123", which your
string, of course, matches.

In the second case, you have indicated:

regexp .123 123

which says "match some character followed by 123" and again, of
course, the second string does NOT match, since there is no character
before the 123.
Post by daneyul
How would I accomplish
the above (ie, formulate an regexp that would handle a variable with or
without an asterisk wild card anywhere within it)?
I suspect that I still don't understand the context that you are
shooting for. The regular expression needs to be a valid one. The
regular expressions are doing exactly what you are asking them to do -
because there is a problem, that means that I, and tcl, doesn't
understand what you are hoping to do.
Alan Anderson
2006-11-29 17:13:48 UTC
Permalink
Post by daneyul
Examples on regexp seem to be always missing the case of an asterisk at
the front of the expression.
My minimal understanding of regular expressions tells me that an
asterisk at the front of an expression is not valid.
Post by daneyul
% set x *123
*123
What strings do you think a regular expression of *123 should match?
Post by daneyul
% regexp $x 123
couldn't compile regular expression pattern: quantifier operand invalid
Are you sure you're putting the regular expression and the string to be
matched against in the right order?
daneyul
2006-11-29 18:14:15 UTC
Permalink
Thanks for the quick responses!

Sorry for not being clear. I did want to use the asterisk as a
wildcard, not a literal.

Glenn's answer ( to use string match ) will actually suit my needs
here. I do need to learn a lot more about the regexp syntax, as I was
treating the * as though I were globbing. I guess I'm used to, in my
limited and simplistic use of regexp, of just treating it as a quick
string pattern checker. String Match is better for this I see, as it
handles the asterisk and is much less complicated for this use.

Thanks again!

-Daniel
Donald Arseneau
2006-11-29 19:56:00 UTC
Permalink
Post by daneyul
Glenn's answer ( to use string match ) will actually suit my needs
here. I do need to learn a lot more about the regexp syntax, as I was
treating the * as though I were globbing.
For future reference, the regexp pattern ".*" is the same as the
glob wildcard "*".
--
Donald Arseneau ***@triumf.ca
Michael A. Cleverly
2006-12-01 01:44:42 UTC
Permalink
Post by Alan Anderson
Post by daneyul
Examples on regexp seem to be always missing the case of an asterisk at
the front of the expression.
My minimal understanding of regular expressions tells me that an
asterisk at the front of an expression is not valid.
Obscure trivia: The only time that * is valid at the front of a regexp in
Tcl is if it is followed by two more asterisks and then either a colon or
an equal sign.

***: tells the regexp engine that what follows is an advanced regular
expression (ARE). Since this is the default I don't think I've _ever_
encountered it or used it anywhere.

***= tells the regexp engine that everything that follows is a literal
string (where every character should be considered a literal character).
I've never actually used it myself but I'm told it can be quite useful
when using the text widgets search feature. (Cf.
http://blog.cleverly.com/permalinks/147.html)

:-)

Michael

Loading...