Discussion:
string map and unicodde
(too old to reply)
lamuzz...@gmail.com
2021-03-25 00:13:43 UTC
Permalink
Hello,
let be this code :

set y "\" something \"";
puts $y ;# > " something"
puts "\u201D \u005C\u201D" ;# > ” \”
puts [string map {\" \\\"} $y] ; # > \" something \"
puts [string map {\u201D \u005C\u201D} $y] ;# > " something"

I'm guessing both puts (with string map) are the same, however the last doesn't work as expected (by me).What i'm doing wrong ?
Thanks,
Saludos

Alejandro
Rich
2021-03-25 02:51:33 UTC
Permalink
Post by ***@gmail.com
Hello,
set y "\" something \"";
puts $y ;# > " something"
puts "\u201D \u005C\u201D" ;# > ? \?
puts [string map {\" \\\"} $y] ; # > \" something \"
puts [string map {\u201D \u005C\u201D} $y] ;# > " something"
I'm guessing both puts (with string map) are the same
They are not.
Post by ***@gmail.com
, however the last doesn't work as expected (by me).
Because it is different from the first.
Post by ***@gmail.com
What i'm doing wrong ?
The string in $y does not contain a \u201D character. So there is
nothing for string map to replace.

Also, be careful of Tcl rule 6 (from man Tcl):

[6] Braces.
If the first character of a word is an open brace ("{")
and rule [5] does not apply, then the word is terminated
by the matching close brace ("}"). Braces nest within
the word: for each addi- tional open brace there must be
an additional close brace (how- ever, if an open brace or
close brace within the word is quoted with a backslash
then it is not counted in locating the matching close
brace). **No substitutions are performed on the
characters between the braces except for
backslash-newline substitutions described below, nor do
semi-colons, newlines, close brackets, or white space
receive any special interpretation.** The word will
consist of exactly the characters between the outer
braces, not including the braces themselves.

Your string maps appear to only work due to a quirk of being shimmered
from a string to a key value list when string map performs the mapping.
You can see this if you just set a variable to a braced string:

% set x {\u0022 \u005C\u201D}
\u0022 \u005C\u201D
% set x
\u0022 \u005C\u201D

The \u escapes were not interpreted, because of the braces. It is best
to use list for creating the mapping if there is anything in the map
that requires expansion:

string map [list \u201D \u005C\u201D] $y

Then the \u escapes are actual words that willl be guaranteed to be
handled by Tcl rule 9 (from man Tcl):

[9] Backslash substitution.
If a backslash ("\") appears within a word then
backslash sub- stitution occurs. In all cases but those
described below the backslash is dropped and the
following character is treated as an ordinary character
and included in the word. This allows characters such as
double quotes, close brackets, and dollar signs to be
included in words without triggering special pro-
cessing. ...
lamuzz...@gmail.com
2021-03-26 00:01:15 UTC
Permalink
Hi Rich,
The string in $y does not contain a \u201D character. So there is
nothing for string map to replace.
The double quote isn't the "same" as \u201D ?.
Is there a context where both are the interchangeable?
The \u escapes were not interpreted, because of the braces. It is best
to use list for creating the mapping if there is anything in the map
string map [list \u201D \u005C\u201D] $y ;# <------- (*1)
Then the \u escapes are actual words that willl be guaranteed to be
But, as well as you said above, in this case using (*1) would not be a solution either.
So, the only way is to use \" and \\\" ?
Rich
2021-03-26 01:11:42 UTC
Permalink
Post by ***@gmail.com
Hi Rich,
The string in $y does not contain a \u201D character. So there is
nothing for string map to replace.
The double quote isn't the "same" as \u201D ?.
The basic keyboard double quote is a \u0022 character. The code point
\u201D is a "right double quotation mark" (i.e., the typography
directional double quote mark). But as \u0022 is not equal to \u201d,
they are not the same to the computer.
Post by ***@gmail.com
Is there a context where both are the interchangeable?
Not in a string map (or other character level code handling) context.

Visibility to humans is font file dependent and most humans will likely
see them as similar enough to have the same contextual meaning. But
Tcl is not a human mind, so you have to be more exacting for the
computer to understand.
Post by ***@gmail.com
The \u escapes were not interpreted, because of the braces. It is
best to use list for creating the mapping if there is anything in
string map [list \u201D \u005C\u201D] $y ;# <------- (*1)
Then the \u escapes are actual words that willl be guaranteed to be
But, as well as you said above, in this case using (*1) would not be a solution either.
So, the only way is to use \" and \\\" ?
If you want to replace a \u0022 character (the ASCII " character code
point) then you have to tell string map that this is the character you
are replacing. You can either use \", or \u0022 which both mean the
same code point, an ASCII " character.
lamuzz...@gmail.com
2021-03-26 02:08:32 UTC
Permalink
My mistake!.
You got the point, i was searching for the wrong code.
Now the code is working as i expected.
Thanks very much,

Alejandro

Loading...