Discussion:
ANNOUNCE: Excel file format reader/writer package OOXML 1.9 released
(too old to reply)
Harald Oehlmann
2024-11-29 14:27:38 UTC
Permalink
Dear TCL team,

OOXML may read and write Excel files.

New features are:
- Set header/footer
- TCL 8.6: optionally read using tcllib::zip::read module, so binary
package vfs::zip is not required any more
- If vfs::zip is used, version 1.0.4 is required
- More checks on file read on invalid files

So, the requirements are:
- TDOM 0.9
- TCL 8.6.7

And for reading one of:
- TCLLIB::ZIP:READ
- VFS::ZIP version 1.0.4 or better

The download page is here:
https://fossil.sowaswie.de/ooxml/uv/download.html

Thanks to all contributors !
Harald (on behalf of the very busy group)
TorstenBerg
2024-12-05 12:06:34 UTC
Permalink
Hi,

thanks for the new version. This is much appreciated and the new options
on paper size and orientation work nicely.

One issue that I found:

When I have a Tcl script encoded in utf-8 and that script writes the
xlsx file, then umlauts come out weird. Is there an option that can
handle this or does ooxml assume or expect text input to be in a
specific encoding?


And an idea:

When formatting cells using the '-style' option of the 'cell' method,
it would be cool to be able to specify more than one style (e.g. as a
list of styleIDs). Then you could have one style for font styling and
another for borders and then combine those two to get cells with a
specific font and border. Conflicting elements of two different styles
of the list could be handled so that a style later in the list would
overwrite settings for the identical option in a previous style in the
list.

Regards, Torsten
Harald Oehlmann
2024-12-05 12:53:04 UTC
Permalink
Hi Torsten,
thanks for the message. Please use the tickets in the tracker:
https://fossil.sowaswie.de/ooxml/ticket
You may author two tickets.

About the "Umlauts". This should be an internal issue. The outputted
data is utf-8 afaik. But this is critical.
I have tested Umlauts when reading and that works.
I had to add an "encoding convertfrom utf-8 $data" to make it work.

Take care,
Harald
Post by TorstenBerg
Hi,
thanks for the new version. This is much appreciated and the new options
on paper size and orientation work nicely.
When I have a Tcl script encoded in utf-8 and that script writes the
xlsx file, then umlauts come out weird. Is there an option that can
handle this or does ooxml assume or expect text input to be in a
specific encoding?
When formatting cells using theĀ  '-style' option of the 'cell' method,
it would be cool to be able to specify more than one style (e.g. as a
list of styleIDs). Then you could have one style for font styling and
another for borders and then combine those two to get cells with a
specific font and border. Conflicting elements of two different styles
of the list could be handled so that a style later in the list would
overwrite settings for the identical option in a previous style in the
list.
Regards, Torsten
Ralf Fassel
2024-12-05 14:46:57 UTC
Permalink
* Harald Oehlmann <***@yahoo.com>
| About the "Umlauts". This should be an internal issue. The outputted
| data is utf-8 afaik. But this is critical.
| I have tested Umlauts when reading and that works.
| I had to add an "encoding convertfrom utf-8 $data" to make it work.

Wouldn't that not also depend on how exactly the TCL script is sourced?
I.e. an tcl script containing literal utf-8 data (not the \uxxxx form)
on Windows with the default system encoding (eg cp1252) would require an
explicit -encoding utf8 for the 'source' command to read it properly.
The OP did not specify what OS he was on, and how the tcl script
containing utf-8 data was sourced...

R'
Harald Oehlmann
2024-12-05 15:16:41 UTC
Permalink
Post by Ralf Fassel
| About the "Umlauts". This should be an internal issue. The outputted
| data is utf-8 afaik. But this is critical.
| I have tested Umlauts when reading and that works.
| I had to add an "encoding convertfrom utf-8 $data" to make it work.
Wouldn't that not also depend on how exactly the TCL script is sourced?
I.e. an tcl script containing literal utf-8 data (not the \uxxxx form)
on Windows with the default system encoding (eg cp1252) would require an
explicit -encoding utf8 for the 'source' command to read it properly.
The OP did not specify what OS he was on, and how the tcl script
containing utf-8 data was sourced...
R'
Ralf,
my message was mis-leading: I have introduced the converfrom into the
source code for reading Excel (not writing). Eventually, this is missing
or there is another error, I don't know.

Looking a bit in the source code:
proc ooxml::Dom2zip {zf node path cd count} {
upvar $cd mycd
upvar $count mycount
append mycd [::ooxml::add_str_to_archive $zf $path [$node asXML
-indent none -xmlDeclaration 1 -encString "UTF-8"]]
incr mycount
}

Ok, always UTF-8

Later:
proc ::ooxml::add_str_to_archive {zipchan path data {comment {}}} {
...
set utfdata [encoding convertto utf-8 $data]

So, I see no issue in the code. I have no idea, what happens here.

I would write the relevant data to an utf-8 flat file for debug

set h [open debug.txt w]
fconfigure $h -encoding utf-8
puts $h $data
close $h

Thanks,
Harald
TorstenBerg
2024-12-05 21:52:35 UTC
Permalink
Hi,

thanks for your ideas wrt. the encoding. I will investigate further and
see whether I can find the culprit. The phenomenon is found on a Windows
machine running a script being utf-8 with Tcl 8.6. So, maybe this
combination is already bad (it probably is) since Windows will expect
the Tcl file tobe in cp1252 or so ...
Harald Oehlmann
2024-12-06 07:31:01 UTC
Permalink
Post by TorstenBerg
Hi,
thanks for your ideas wrt. the encoding. I will investigate further and
see whether I can find the culprit. The phenomenon is found on a Windows
machine running a script being utf-8 with Tcl 8.6. So, maybe this
combination is already bad (it probably is) since Windows will expect
the Tcl file tobe in cp1252 or so ...
All my pckingdex files have this:

source -encoding utf-8 $file
Ralf Fassel
2024-12-06 10:01:01 UTC
Permalink
* ***@typoscriptics.de (TorstenBerg)
| thanks for your ideas wrt. the encoding. I will investigate further and
| see whether I can find the culprit. The phenomenon is found on a Windows
| machine running a script being utf-8 with Tcl 8.6. So, maybe this
| combination is already bad (it probably is) since Windows will expect
| the Tcl file tobe in cp1252 or so ...

Definitely:

https://www.tcl.tk/man/tcl/TclCmd/source.htm

SYNOPSIS
source fileName
source -encoding encodingName fileName

[...]
The -encoding option is used to specify the encoding of the data stored
in fileName. When the -encoding option is omitted, the system encoding
is assumed.

See also Harald's response (always specify the encoding with 'source'
when the file is not ASCII). You could use the \u-Notation if there are
only a few Unicode characters in the file (with many, the file becomes
unreadable IMHO).

R'

Loading...