Thread: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique (Page 2)

The Tool Command Language implementation

Brought to you by: andreas_kupries, apnadkarni, bgriffin, das, and 10 others

tcl-core

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Christian G. <aur...@gm...> - 2023-02-01 20:47:09

Am 01.02.23 um 12:45 schrieb apnmbx-public--- via Tcl-Core:
> A comment on Christian's -erroron mask suggestion.
>
> -erroron would define what constitutes an error. But it does not say what
> should be done in case of that error which I think is the more important
> issue to address.
>
> So for example, if \xC0 is encountered in [encoding convertfrom utf-8],
> should that be mapped to U+00C0, mapped to U+FFFD, raise an exception etc. I
> think that is more important than distinguishing between error cases like
> surrogate in utf-8 vs \xC0 in utf-8.
>
> So while it may have some use, it doesn't really address the current
> discussion.


OK thanks, than how about an expanded variant:

-handle {SURROGATE error INVALID replace INCOMPLETE ignore ...}

Basically, what I would suggest is a way to configure the behaviour
during the en/decoding and then, of course, set some sensible default -
e.g. the same behaviour that Python uses, or Tcl8 - but leave it open
for the future programmer to set the error handling to their liking. It
may be application dependant, that's why "strict" and "nocomplain" etc.
exist - just that I do not think one should hardcode those, especially
with "weird" names that do not explain what is going on.

	Christian

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Poor Y. <org...@po...> - 2023-02-02 00:55:05

On 2023-02-01 22:46, Christian Gollwitzer wrote:
> Basically, what I would suggest is a way to configure the behaviour
> during the en/decoding and then, of course, set some sensible default -
> e.g. the same behaviour that Python uses, or Tcl8 - but leave it open
> for the future programmer to set the error handling to their liking. It
> may be application dependant, that's why "strict" and "nocomplain" etc.
> exist - just that I do not think one should hardcode those, especially
> with "weird" names that do not explain what is going on.
> 

That was the motivation for the "trunk-encodingdefaultorig" and
"trunk-encodingdefaultstrict" branches, which are what I'd like to see 
trunk
become.  "-strict" is still there, but it just means what it does in 
Python:
"Return an error if there is an encoding problem".

Here's another possibility for passing encoding options:  Make the value 
of
"-encoding" a list:

	chan configure $chan -encoding {utf-8 strict ...}

To change options without changing the encoding:

	chan configure $chan -encoding {{} strict ...}


-- 
Yorick

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Peter Da S. <pet...@fl...> - 2023-02-02 02:34:39


> Here's another possibility for passing encoding options:  Make the value of
> "-encoding" a list:
> 
> 	chan configure $chan -encoding {utf-8 strict ...}
> 
> To change options without changing the encoding:
> 
> 	chan configure $chan -encoding {{} strict ...}

Why not [chan configure $chan -encoding {strict ...}] and treat the actual encoding as just another option? There shouldn't be any possibility of a collision.

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Poor Y. <org...@po...> - 2023-02-02 11:39:30

On 2023-02-02 04:19, Peter Da Silva wrote:
>> Here's another possibility for passing encoding options:  Make the 
>> value of
>> "-encoding" a list:
>> 
>> 	chan configure $chan -encoding {utf-8 strict ...}
>> 
>> To change options without changing the encoding:
>> 
>> 	chan configure $chan -encoding {{} strict ...}
> 
> Why not [chan configure $chan -encoding {strict ...}] and treat the 
> actual encoding as just another option? There shouldn't be any 
> possibility of a collision.

What about an encoding named "strict"?  I'm not in favor of constraining 
the
possible encoding names.  According to Jan, a side-effect of TIP 601 
means that
an encoding name ending in "-" is now not allowed:

	https://core.tcl-lang.org/tcl/tktview/a31caff05780aabdd20a8468673340435862dbf9

I hope that, too, can be remedied.

The value of "-encoding" could be a dictionary:
	chan configure $chan -encoding {name utf-8 strict 1 surrogates 0 ...}
If the number of items in the list is odd, "name" could be implied:
	chan configure $chan -encoding {utf-8 strict 1 surrogates 0 ...}

	chan configure $chan -encoding utf-8

The rule for conflicting option arguments could be that later arguments 
take
precedence.

-- 
Yorick

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: <apn...@ya...> - 2023-02-02 17:11:02

Jan,

 

I’ve implemented the profile based scheme below (strict, tcl8, {}) in the https://core.tcl-lang.org/tcl/timeline?r=apn-encoding-profile branch.

 

As the diffs show, only the option parsing code is changed so risk is minimal. Internal flags and code is completely unchanged. That could use some refactoring when other profiles are added.

 

One (intentional) difference from the current core-8-branch behavior:

 

The existing 8.7 behavior allowed ambiguity when 2 arguments were provided to the encoding convert{from,to} commands whereby the first argument could be interpreted as either an option or encoding name. I don’t think this ambiguity is a good idea and moreover breaks 8.6 compatibility in a strict sense. In the new profile based implementation, there is no ambiguity as it is always treated as encoding name. This means if options are specified, the encoding name must be specified and cannot be defaulted. This is a Good Thing imo and makes the command both 8.6 compatible and less susceptible to programming errors.

 

Further, I am not sure of the interaction (for 8.7) or even the intent of the CHANNEL_ENCODING_NOCOMPLAIN flag to configure. It seems to have no effect except for some interaction with TCL_NO_DEPRECATE for reasons I don’t understand.

 

The tests are still being worked on.

 

If you are amenable to these changes, (I’d hope you are given your comments below) I’ll write up the TIP (or have Nathan adapt TIP 654) for 9.0

 

/Ashok

 

From: Jan Nijtmans <jan...@gm...> 
Sent: Wednesday, February 1, 2023 9:23 PM
To: apn...@ya...
Cc: tcl...@li...
Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

 

Op wo 1 feb. 2023 om 16:22 schreef apnmbx-public--- via Tcl-Core:

-profile strict -> flags for -strict

-profile “tcl8” -> implicit default flags

-profile nocomplain -> -nocomplain flags (although I would prefer just getting rid of this option)

 

Then I would suggest:

      -profile strict     -> flags for -strict

      -profile tcl8       -> flags for -nocomplain (since this is the default for Tcl 8)

      -profile {}          -> no flags (default for Tcl 9)

 

(and fconfigure equivalents) be fairly straightforward?

 

Indeed

 

Regards,

       Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Peter Da S. <pet...@fl...> - 2023-02-02 17:53:17

I really like this idea. It also adds the option of turning flags off (eg {strict 0})

The value of "-encoding" could be a dictionary:
        chan configure $chan -encoding {name utf-8 strict 1 surrogates 0 ...}
If the number of items in the list is odd, "name" could be implied:
        chan configure $chan -encoding {utf-8 strict 1 surrogates 0 ...}

        chan configure $chan -encoding utf-8

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: <apn...@ya...> - 2023-02-03 03:56:12

The idea has some merit but I have a couple of concerns with the approach
below.

 

At first glance it tackles a different problem than what is being discussed.
It addresses configuration of what is to be considered an invalid byte
sequence. It does not address how a sequence considered invalid is to be
handled (map to U+FFFD, map to lossless, map to numeric equivalent etc.).
Now one could add those as additional dictionary options/keys but that
increases complexity from a user perspective (what does "strict 1 surrogates
0 invalid 1" etc. mean?). And the user / application does not care in the
vast majority of cases where the error stems from (exception being the
needmoredata case which is a separate category discussed elsewhere). It
feels like over-generalization to me.

 

Second, and possibly more important, I foresee considerable implementation
complexity in the encoders to handle this fine-grained, "tunable"
configuration. Particularly so since there is no mechanism currently to pass
this down into the encoder "call chains" and would entail API changes. Of
course, I might be wrong and a prototype implementation could immediately
refute this "implementability" concern.

 

/Ashok

 

From: Peter Da Silva <pet...@fl...> 
Sent: Thursday, February 2, 2023 10:49 PM
To: Poor Yorick <org...@po...>; Tcl Core List
<tcl...@li...>
Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

 

I really like this idea. It also adds the option of turning flags off (eg
{strict 0})


The value of "-encoding" could be a dictionary:
        chan configure $chan -encoding {name utf-8 strict 1 surrogates 0
...}
If the number of items in the list is odd, "name" could be implied:
        chan configure $chan -encoding {utf-8 strict 1 surrogates 0 ...}

        chan configure $chan -encoding utf-8

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Jan N. <jan...@gm...> - 2023-02-02 22:09:39

Op do 2 feb. 2023 om 18:11 schreef apnmbx-public--- via Tcl-Core:

> Jan,
>
>
>
> I’ve implemented the profile based scheme below (strict, tcl8, {}) in the
> https://core.tcl-lang.org/tcl/timeline?r=apn-encoding-profile branch.
>

Thanks.  I'll have a look.


> Further, I am not sure of the interaction (for 8.7) or even the intent of
> the CHANNEL_ENCODING_NOCOMPLAIN flag to configure. It seems to have no
> effect except for some interaction with TCL_NO_DEPRECATE for reasons I
> don’t understand.
>

That's correct. For 8.7 CHANNEL_ENCODING_NOCOMPLAIN is the default,
so setting the flag or not makes no difference. If you compile Tcl 8.7
with CFLAGS=-DTCL_NO_DEPRECATED, the encodings will start
behaving like in Tcl 9.0. This way it's possible to test the 9.0 behavior,
without continuously switching branches. So it speeds up development
maintaining 2 branches (8.7 and 9.0)

Hope this helps,
    Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Jan N. <jan...@gm...> - 2023-02-03 13:18:17

Op do 2 feb. 2023 om 23:09 schreef Jan Nijtmans:

> Thanks.  I'll have a look.
>

Well, I know the testcases are still not handled, that's OK. But
two things I can already report (which you - most likely -
didn't notice yet).

First, an example in Tcl 8.7:
    $ tclsh8.7 (core-8-branch)
    % fconfigure stdin
-blocking 1 -buffering line -buffersize 4096 -encoding utf-8 -eofchar {}
-nocomplainencoding 1 -strictencoding 0 -translation auto -closemode
default -inputmode normal -mode 38400,n,8,1 -xchar { }
Note the "-nocomplainencoding 1 -strictencoding 0". This should correspond
with "-endingprofile tcl8"
    $ tclsh8.7 (apn-encoding-profile)
    % fconfigure stdin
-blocking 1 -buffering line -buffersize 4096 -encoding utf-8
-encodingprofile {} -eofchar {} -translation auto -closemode default
-inputmode normal -mode 38400,n,8,1 -xchar { }
    %
I see "-encodingprofile {}". That's not what I would expect. Then, another
issue:
    % fconfigure stdin -e
    {}
    % fconfigure stdin -en
    utf-8
    % fconfigure stdin -encoding
    utf-8
    % fconfigure stdin -encodingp
    {}

So, specifying "-e" or "-encodingp" selects "-encodingprofile", but when
the string has
length 2 up to 9 it selects "encoding". My suggestion: just use "-profile"

That's it for now.

Regards,
       Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Donald G P. <don...@ni...> - 2023-02-03 18:53:53

On 1/27/23 10:36, apnmbx-public--- via Tcl-Core wrote:
>
> I’ve written up my view of “state of Unicode in Tcl 9” at https://www.magicsplat.com/tcl9/tcl9unicode.html <https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.magicsplat.com%2Ftcl9%2Ftcl9unicode.html&data=05%7C01%7Cdonald.porter%40nist.gov%7Ca50e9f009bb8451e79f908db007c6664%7C2ab5d82fd8fa4797a93e054655c61dec%7C1%7C0%7C638104306513471515%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aWNJU765jtf8q6kNK6CXEA0QW6vTqE8pDEJFmet0E8M%3D&reserved=0>
>

Thank you for putting this together.  You have a real talent for writing that captures precise details without becoming too tedious. This is very useful.  The document reveals many things that are different from what I assumed, and raises other shortcomings.

It appears to me that a good stress-testing use case is the task to create and then later unpack a lossless archive of a directory in a filesystem.  This task intersects with many of the issues presented in the document. It also touches on the ill-defined Tcl concept of the "system encoding" and whether it needs revision.

A key command in any archive creation is [glob].  It seems that [glob] has never been capable of handling file names that fall outside the system encoding.* Revising Tcl 9 so that it can handle such file names suggests a few possibilities:

   1) Have all Tcl strings make use of PEP 383, or some alternative means to represent such names.  This implies that the alphabet for Tcl strings must include symbols outside the set of unicode scalar values.  (PEP 383 uses lone surrogates); OR

   2) Revise [glob] so it returns encoding information in addition to the list of filenames.  The iso-8859-1 encoding is a way to capture every filename losslessly, but it is a poor universal solution. Design of such a reformed [glob] with reasonable compatibility is at least tricky.; OR

   3) Create a new [glob2]** that returns encoding information without messy compatibility constraints, and leave it up to scripts and extensions to move from the old command to the new one as they perceive a need to robustly handle these edge cases.

Both 2) and 3) may impose constraints and demand revision to the Tcl_Filesystem interface and its Tcl_FSMatchInDirectoryProc slot. The encoding to be used to interpret the bytes of a filename might better be an attribute of a Tcl_Filesystem or of a mount point rather than an application-wide (and not thread-stable?) notion of a system encoding pulled in through a side channel.

For 1) if the alphabet for Tcl strings is larger than unicode scalar values, that provides a clear use and meaning for [string is unicode] which has puzzled some people.  Maybe a change to [string is usv] would be clearer to the reader that the test is whether symbols outside the set of unicode scalar values are present.  These are symbols that cannot be properly encoded in the Unicode encodings utf-8, utf-16, utf-32.

Thanks again.  Plenty to be done here.

* The one use case I've seen presented for scripts to be able to set the system encoding with [encoding system $enc] has been the power to work around this problem.

** not a final name choice, just the concept of a new command.

-- 
| Don Porter            Applied and Computational Mathematics Division |
| don...@ni...             Information Technology Laboratory |
| http://math.nist.gov/~DPorter/                                  NIST |
|______________________________________________________________________|

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Rolf A. <tcl...@po...> - 2023-02-04 01:34:14

Donald G Porter via Tcl-Core writes:
> On 1/27/23 10:36, apnmbx-public--- via Tcl-Core wrote:
>>
>> I’ve written up my view of “state of Unicode in Tcl 9” at
>> https://www.magicsplat.com/tcl9/tcl9unicode.html

> [...] Revise [glob] [...]

I agree with that. 

But no matter what or how ...

> For 1) if the alphabet for Tcl strings is larger than unicode scalar
> values, that provides a clear use and meaning for [string is unicode]
> which has puzzled some people.  Maybe a change to [string is usv]
> would be clearer to the reader that the test is whether symbols
> outside the set of unicode scalar values are present.  These are
> symbols that cannot be properly encoded in the Unicode encodings
> utf-8, utf-16, utf-32.

In its current existence on trunk (and during its whole short lifetime
AFAIK) [string is unicode] returns 0 on surrogate _and_ "noncharacter"
code-points. But there is no doubt that "noncharacter" code-points _can_
be properly encoded in utf-8, utf-16, utf-32.

There's no way as I have mentioned a few times and Ashok discusses in
his paper to alter or remove [string is unicode], no handed later
justification helps.

But, agreed, thats the smallest portion of the things to do here.

rolf

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Poor Y. <org...@po...> - 2023-02-03 21:39:10

On 2023-02-03 18:19, Donald G Porter via Tcl-Core wrote:
> On 1/27/23 10:36, apnmbx-public--- via Tcl-Core wrote:
>> 
> 
> Both 2) and 3) may impose constraints and demand revision to the 
> Tcl_Filesystem interface and its Tcl_FSMatchInDirectoryProc slot. The 
> encoding to be used to interpret the bytes of a filename might better 
> be an attribute of a Tcl_Filesystem or of a mount point rather than an 
> application-wide (and not thread-stable?) notion of a system encoding 
> pulled in through a side channel.

Even this won't solve the problem. Posix filesystems don't maintain a 
known
encoding as part of their configuration.  An ext4 filesystem mounted at 
root
may have filenames encoded in utf-8, and then another ext4 filesysem 
mounted
somewhere else might have filenames encoded in another encoding.  No 
matter
what encoding Tcl attributes to this combined set of files, it's going 
to be
wrong at some point.

> 
> For 1) if the alphabet for Tcl strings is larger than unicode scalar 
> values, that provides a clear use and meaning for [string is unicode] 
> which has puzzled some people.  Maybe a change to [string is usv] would 
> be clearer to the reader that the test is whether symbols outside the 
> set of unicode scalar values are present.  These are symbols that 
> cannot be properly encoded in the Unicode encodings utf-8, utf-16, 
> utf-32.

There's no need for [string is unicode], [string is usv], [string is
iso8859-1], [string is shiftjis], or any [string is any_other_encoding].
[encoding convertto] and [encoding convertfrom] already adequately cover 
this
functionality.  See TIP 652.

-- 
Yorick

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Kevin K. <kev...@gm...> - 2023-02-06 04:19:20

On Fri, Feb 3, 2023 at 4:39 PM Poor Yorick <org...@po...>
wrote:

> On 2023-02-03 18:19, Donald G Porter via Tcl-Core wrote:
> > On 1/27/23 10:36, apnmbx-public--- via Tcl-Core wrote:
> >>
> >
> > Both 2) and 3) may impose constraints and demand revision to the
> > Tcl_Filesystem interface and its Tcl_FSMatchInDirectoryProc slot. The
> > encoding to be used to interpret the bytes of a filename might better
> > be an attribute of a Tcl_Filesystem or of a mount point rather than an
> > application-wide (and not thread-stable?) notion of a system encoding
> > pulled in through a side channel.
>
> Even this won't solve the problem. Posix filesystems don't maintain a
> known
> encoding as part of their configuration.  An ext4 filesystem mounted at
> root
> may have filenames encoded in utf-8, and then another ext4 filesysem
> mounted
> somewhere else might have filenames encoded in another encoding.  No
> matter
> what encoding Tcl attributes to this combined set of files, it's going
> to be
> wrong at some point.
>

The only known solution to that problem is to perform a temporary [encoding
system iso8859-1] prior to [open] (or any other activity manipulating a
path name, and then construct all path names by concatenating results of
[encoding convertto] before and after the offending mount point - of
course, reverrting [encoding system] as soon as the path name is sent to
the OS.

That will have the effect of treating the path names as sequences of bytes
and pushing encoding management onto the user of the filesystem.

It's truly nasty - but I don't have any better ideas unless we start having
virtual filesystems mirroring the Posix mount points - which I suppose
would be doable, but I'm not sure it's worth the effort for this one
bizarre case. (Which, by the way, is just about the only legitimate use
I've found for changing [encoding system].)
-- 
73 de ke9tv/2, Kevin

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Jan N. <jan...@gm...> - 2023-02-05 23:24:26

Op vr 3 feb. 2023 om 14:17 schreef Jan Nijtmans:

> Well, I know the testcases are still not handled, that's OK. But
> two things I can already report (which you - most likely -
> didn't notice yet).
>

I took the liberty to dive a little bit deeper, and fixed the
errors I noticed. Hope you like it:
     <https://core.tcl-lang.org/tcl/info/7eb72e77393970c4>

Regards,
    Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: <apn...@ya...> - 2023-02-07 11:28:48

Jan,

 

I’ve moved your changes to branch jan-encoding-profile.

 

Will cherry pick later after further discussion.

 

/Ashok

 

From: Jan Nijtmans <jan...@gm...> 
Sent: Monday, February 6, 2023 4:54 AM
To: apn...@ya...
Cc: tcl...@li...
Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

 

Op vr 3 feb. 2023 om 14:17 schreef Jan Nijtmans:

Well, I know the testcases are still not handled, that's OK. But

two things I can already report (which you - most likely -

didn't notice yet).

 

I took the liberty to dive a little bit deeper, and fixed the

errors I noticed. Hope you like it:

     <https://core.tcl-lang.org/tcl/info/7eb72e77393970c4>

 

Regards,

    Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Jan N. <jan...@gm...> - 2023-02-07 15:09:44

Op di 7 feb. 2023 om 12:29 schreef apnmbx-public:

> I’ve moved your changes to branch jan-encoding-profile.
>
>
>
> Will cherry pick later after further discussion.
>

Great!   I see you are adding a lot of test-cases!

Regards,
       Jan Nijtmans

Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique

From: Harald O. <har...@el...> - 2023-01-27 17:28:47

Attachments: OpenPGP_signature

Ashok,

thank you for the great document. I learn each day. I always used "\u" 
to have a fixed length code, as "\x" is not fixed length - ok, the same 
issue for \u and \U...

Nevertheless, it is very interesting to read all this. That TIP601 was 
completly reverted by TIP 346. Your chapter 5.1 default mode totally 
violates TIP601 (to raise an error on any issue, when there is no 
"-nocomplain"). So, I think, TIP601 is gone and replaced by the two points:
- default to accept any error (as in 8.7)
- use strict to raise errors (as the default in TIP601)

That is ok. It just shows, how the TCT wants it. Then, we can throw away 
TIP601 and replace "-nocomplain" by "-strict".

I like your analysis of "-nocomplain", that it only affects codepoints 
outside the unicode range.

---

Chapter 5.2, Case 1. (2nd paragraph): Text: " For examples, code points 
higher than U+00FF are not supported in the ASCII encoding".
The part "U+00FF" should be "U+007F".

---

- Definitions: You work a lot on definitions, what is great. A list may 
be added with TCL definitions including "TCL string", "TCL binary". 
Also, the following concepts may be explained: "BMP", "Surrogates", 
Encodings "utf-8", "utf-16", "CESU-8".

- about "encoding binary". Kevin Kenny ones stated that "encoding 
binary" and "encoding iso8859-1" is the same (but translation, eof).
This also enlighted me.

---

It is quite hard to realize, that I work now for 8 months on this 
subject and we have a very inconsistend and contradictionary result. 
Well, we keep going...

Thank you all,
Harald

Am 27.01.2023 um 16:36 schrieb apnmbx-public--- via Tcl-Core:
> I’ve written up my view of “state of Unicode in Tcl 9” at 
> https://www.magicsplat.com/tcl9/tcl9unicode.html 
> <https://www.magicsplat.com/tcl9/tcl9unicode.html>
> 
> My hope is that this will (a) serve as a tutorial for those not familiar 
> with the issues around Unicode (one-eyed leading the blind and all that) 
> and (b) prompt a broader discussion around the issues raised in the 
> mailing list and tickets.
> 
> A summary TOC is below. I hope this prods more folks in the TCT (and 
> outside) to weigh in with their opinions one way or the other.
> 
> Apologies for the length of the document but it’s not easy to summarise.
> 
> /Ashok
> 
>   * 1 About this document
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#about-this-document>
>   * 2 Background
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#background>
>   * 3 Tcl strings
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#tcl-strings>
>       o 3.1 ASCII escape sequences for non-ASCII code points
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#ascii-escape-sequences-for-non-ascii-code-points>
>       o 3.2 Binary strings
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#binary-strings>
>       o 3.3 Issues in string definition
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-string-definition>
>           + 3.3.1 No definition of what constitutes a Tcl string
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-definition-of-what-constitutes-a-tcl-string>
>           + 3.3.2 Inconsistent handling for out of range code points
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistent-handling-for-out-of-range-code-points>
>           + 3.3.3 Surrogates as literals
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#surrogates-as-literals>
>           + 3.3.4 Variable length escape sequences
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#variable-length-escape-sequences>
>   * 4 String commands
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-commands>
>       o 4.1 String classification
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-classification>
>       o 4.2 Issues in string commands
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-string-commands>
>           + 4.2.1 string is unicode
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-is-unicode>
>           + 4.2.2 Nonconformant interpretation of string values
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#nonconformant-interpretation-of-string-values>
>   * 5 Encoding transforms
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#encoding-transforms>
>       o 5.1 Transforming encoded byte sequences to Tcl strings
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#transforming-encoded-byte-sequences-to-tcl-strings>
>       o 5.2 Transforming Tcl strings to encoded byte sequences
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#transforming-tcl-strings-to-encoded-byte-sequences>
>       o 5.3 Issues in encoding transforms
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-encoding-transforms>
>           + 5.3.1 Only partial support for conforming error handling
>             behavior
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#only-partial-support-for-conforming-error-handling-behavior>
>           + 5.3.2 Error handling options are incomplete and inconsistent
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#error-handling-options-are-incomplete-and-inconsistent>
>           + 5.3.3 Default handling of invalid bytes is neither
>             conformant nor consistent
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#default-handling-of-invalid-bytes-is-neither-conformant-nor-consistent>
>           + 5.3.4 No support for lossless operation
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-support-for-lossless-operation>
>           + 5.3.5 Default encoder handling should be strict conformance
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#default-encoder-handling-should-be-strict-conformance>
>           + 5.3.6 -failindex does not distinguish errors from incomplete
>             sequences
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#failindex-does-not-distinguish-errors-from-incomplete-sequences>
>           + 5.3.7 Inconsistency in default handling of surrogates
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistency-in-default-handling-of-surrogates>
>           + 5.3.8 Inconsistency between error handling for different
>             encodings
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistency-between-error-handling-for-different-encodings>
>           + 5.3.9 Manpages for encoding have errors
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#manpages-for-encoding-have-errors>
>   * 6 Input and Output
>     <https://www.magicsplat.com/tcl9/tcl9unicode.html#input-and-output>
>       o 6.1 Input from channels
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#input-from-channels>
>           + 6.1.1 Blocking read
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#blocking-read>
>           + 6.1.2 Non-blocking read
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#non-blocking-read>
>           + 6.1.3 Blocking gets
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#blocking-gets>
>           + 6.1.4 Non-blocking gets
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#non-blocking-gets>
>       o 6.2 Output on channels
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#output-on-channels>
>       o 6.3 Binary channels
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#binary-channels>
>       o 6.4 File paths and system interfaces
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#file-paths-and-system-interfaces>
>       o 6.5 Issues in I/O and system interfaces
>         <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-io-and-system-interfaces>
>           + 6.5.1 Behavior of read violates defined semantics
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#behavior-of-read-violates-defined-semantics>
>           + 6.5.2 Channel read state after errors
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#channel-read-state-after-errors>
>           + 6.5.3 Channel write state after errors
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#channel-write-state-after-errors>
>           + 6.5.4 File and system APIs are not lossless
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#file-and-system-apis-are-not-lossless>
>           + 6.5.5 No error raised for conflicting options
>             <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-error-raised-for-conflicting-options>

<< < 1 2 (Page 2 of 2)