Discussion:
Regex continuation strangeness
(too old to reply)
gamename
2005-08-14 04:49:42 UTC
Permalink
Hi,

I'm using a regex to detect errors. Since there are numerous possible
errors, I'm trying to store them in a var. For some reason, the regex
stops working when the var has a continuation. Example:

expect1.99> set line "% Invalid input detected at '^' marker."
% Invalid input detected at '^' marker.

expect1.100> set ERROR {\
^%|\
}
expect1.101> regexp $ERROR $line
1 <--- worked
expect1.102> set ERROR {\
^%|\
^% Invalid input detected\
}
expect1.103> regexp $ERROR $line
0 <--- Failed

Why would it work in the first instance and fail in the second?

TIA,
-Tennis
Benjamin Riefenstahl
2005-08-14 15:19:01 UTC
Permalink
Hi,
Post by gamename
expect1.99> set line "% Invalid input detected at '^' marker."
% Invalid input detected at '^' marker.
expect1.100> set ERROR {\
^%|\
}
In the Tcl documentation (see man n re_syntax), I don't see a
definition of what "\" at the end of a line means in regular
expressions. From the results (see below) I would guess that it is
evaluated the same as in Tcl strings, i.e. "\" + newline + any number
of whitespace characters is replaced by one space. That would make
your expression equivalent to

% set ERROR { ^%| }
Post by gamename
expect1.101> regexp $ERROR $line
1 <--- worked
Depending on what "worked" means for you:

% set line "% Invalid input detected at '^' marker."
% Invalid input detected at '^' marker.
% set ERROR {\
^%|\
}
^%|
% if {[regexp $ERROR $line match]} {puts "<$match>"}
< >
%

IOW, the part *after* the "|" in your expression matched the first
space in the input string.
Post by gamename
expect1.102> set ERROR {\
^%|\
^% Invalid input detected\
}
Which would be exquivalent to

% set ERROR { ^%| ^% Invalid input detected }

Of course none of those two occurs in your string. Actually given
that "^" means "beginning of the string", " ^" doesn't make sense at
all.


You may have been thinking of what re_syntax calls "expanded syntax",
so maybe this is what you want:

% set ERROR {
^%| # Simple version
^%\ Invalid\ input\ detected # Complex version
}

^%| # Simple version
^%\ Invalid\ input\ detected # Complex version

% if {[regexp -expanded $ERROR $line match]} {puts "<$match>"}
% Invalid input detected
%

Note that there is no reason to escape the newlines. But OTOH in this
syntax you need to escape those spaces that should be part of the
actual sub-expressions.


benny

Loading...