Discussion:
tdom encoding
(too old to reply)
saito
2024-12-17 00:01:02 UTC
Permalink
I am trying to see why tdom is failing on this json snippet.

package req tdom
set x {{"name":"Jeremi"}}
dom parse -json $x

==> error "JSON syntax error" at position 15
"{"name":"Jeremi <--Error-- "}"


If it doesn't get removed by the newsgroup editors, there is a weird
character at the very end of x. It looks almost like "[]" but it is
not. When you edit it, it acts as if it has multiple characters in it.


Another problem is that tdom man page talks about a command "dom
setResultEncoding ?encodingName?" but trying it results in an unknown
command error.
greg
2024-12-17 02:13:14 UTC
Permalink
Post by saito
I am trying to see why tdom is failing on this json snippet.
package req tdom
set x {{"name":"Jeremi"}}
dom parse -json $x
==> error "JSON syntax error" at position 15
"{"name":"Jeremi <--Error-- "}"
If it doesn't get removed by the newsgroup editors, there is a weird
character at the very end of x.  It looks almost like "[]" but it is
not.  When you edit it, it acts as if it has multiple characters in it.
Another problem is that tdom man page talks about a command "dom
setResultEncoding ?encodingName?" but trying it results in an unknown
command error.
Hello,

The unknown character is 007 or BELL.
Probably not allowed as a char in string.
Instead: \u0007

Gregor


package req tdom

proc chr c {
if {[string length $c] > 1 } {
error "chr: arg should be a single char"
}
set v 0
scan $c %c v
return $v
}

# Check character types and provide additional information
proc charInfo char {
if {[string is control $char]} {
return "control character"
} elseif {[string is space $char]} {
return "space character"
} elseif {[string is digit $char]} {
return "digit character"
} elseif {[string is lower $char]} {
return "lowercase alphabetic character"
} elseif {[string is upper $char]} {
return "uppercase alphabetic character"
} elseif {[string is punct $char]} {
return "punctuation character"
} elseif {[string is graph $char]} {
return "graphical character"
} elseif {[string is print $char]} {
return "printable character"
} else {
return "unknown character type"
}
}

proc infochar {x} {
puts $x
set i 0
while {$i<[string length $x]} {
set c [string index $x $i]
puts "$i is $c [charInfo $c] [chr $c] "
incr i
}
}

set x {{"name":"Jeremi"}}
infochar $x
catch {dom parse -json $x} mess
puts "mess: $mess"

set x {{"name":"Jeremi\u0007"}}
set doc [dom parse -json $x]
puts [$doc asXML]
saito
2024-12-17 04:51:11 UTC
Permalink
Post by greg
Hello,
The unknown character is 007 or BELL.
Probably not allowed as a char in  string.
Instead: \u0007
Gregor
Thank you and Rich for the wonderful info and the code.

The json data is what I receive from an api. I first thought it had to
do with encoding issues. It happens frequently so I maybe I will ask
them to be more careful with their json data generation.
Rich
2024-12-17 04:59:22 UTC
Permalink
Post by saito
Post by greg
Hello,
The unknown character is 007 or BELL.
Probably not allowed as a char in  string.
Instead: \u0007
Gregor
Thank you and Rich for the wonderful info and the code.
The json data is what I receive from an api. I first thought it had
to do with encoding issues. It happens frequently so I maybe I will
ask them to be more careful with their json data generation.
If you are getting it from an API then you've found a bug if the API
is /really/ sending raw control characters as part of a JSON string.
Alan Grunwald
2024-12-19 16:36:20 UTC
Permalink
On 17/12/2024 02:13, greg wrote:

<snip>
Post by greg
proc chr c {
  if {[string length $c] > 1 } {
    error "chr: arg should be a single char"
  }
  set v 0
  scan $c %c v
  return $v
}
# Check character types and provide additional information
proc charInfo char {
  if {[string is control $char]} {
    return "control character"
  } elseif {[string is space $char]} {
    return "space character"
  } elseif {[string is digit $char]} {
    return "digit character"
  } elseif {[string is lower $char]} {
    return "lowercase alphabetic character"
  } elseif {[string is upper $char]} {
    return "uppercase alphabetic character"
  } elseif {[string is punct $char]} {
    return "punctuation character"
  } elseif {[string is graph $char]} {
    return "graphical character"
  } elseif {[string is print $char]} {
    return "printable character"
  } else {
    return "unknown character type"
  }
}<snip>
Many thanks from me too for the above procs, which have made their way
(with acknowledgement) into my personal library of utility routines.

Alan

Rich
2024-12-17 04:20:54 UTC
Permalink
Post by saito
I am trying to see why tdom is failing on this json snippet.
package req tdom
set x {{"name":"Jeremi^G"}}
dom parse -json $x
==> error "JSON syntax error" at position 15
"{"name":"Jeremi^G <--Error-- "}"
Assuming the ^G that did come through properly represnts the
character, then greg is right, it is an ASCII bell character, and per
the JSON spec [1] raw control characters are not allowed to be part of
a JSON string.

Which is why Tdom is telling you 'error' at the ^G output.

Are you on linux? If yes the hexdump, objdump, or xxd (xxd is easiest
to use) commands will show you exactly what raw byte values exist in
the file.


[1] https://www.json.org/json-en.html
Rolf Ade
2024-12-18 14:04:07 UTC
Permalink
Post by saito
I am trying to see why tdom is failing on this json snippet.
package req tdom
set x {{"name":"Jeremi"}}
dom parse -json $x
==> error "JSON syntax error" at position 15
"{"name":"Jeremi <--Error-- "}"
Rich already pointed out rightly that control characters are not allowed
literally in JSON strings. As tDOM rightly complains your input is not
JSON.

[snip]
Post by saito
Another problem is that tdom man page talks about a command "dom
setResultEncoding ?encodingName?" but trying it results in an unknown
command error.
You obviously use a (very) old tDOM version. The dom method
setResultEncoding is a relict out of the times as tDOM still supported
Tcl 8.0 (and the functionality was only needed / useful if build/used
with Tcl 8.0).

The documentation and implementation of this method was removed with
tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.

rolf
saito
2024-12-18 19:57:11 UTC
Permalink
Post by Rolf Ade
You obviously use a (very) old tDOM version. The dom method
setResultEncoding is a relict out of the times as tDOM still supported
Tcl 8.0 (and the functionality was only needed / useful if build/used
with Tcl 8.0).
The documentation and implementation of this method was removed with
tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.
Thanks for the info. I am using version 0.9.5 I downloaded from its
official site some time ago. It comes with no documentation so I did an
internet search. I guess that piece of info is from an outdated web
page obviously, which I kind of guessed.
Harald Oehlmann
2024-12-18 20:49:14 UTC
Permalink
Post by saito
Thanks for the info. I am using version 0.9.5 I downloaded from its
official site some time ago.  It comes with no documentation so I did an
internet search.  I guess that piece of info is from an outdated web
page obviously, which I kind of guessed.
http://tdom.org/index.html/doc/trunk/doc/index.html
saito
2024-12-18 22:29:54 UTC
Permalink
Post by Harald Oehlmann
Post by saito
Thanks for the info. I am using version 0.9.5 I downloaded from its
official site some time ago.  It comes with no documentation so I did
an internet search.  I guess that piece of info is from an outdated
web page obviously, which I kind of guessed.
http://tdom.org/index.html/doc/trunk/doc/index.html
Thanks, good to know.
Loading...