|
From: Harald O. <har...@el...> - 2023-01-27 17:28:47
|
Ashok, thank you for the great document. I learn each day. I always used "\u" to have a fixed length code, as "\x" is not fixed length - ok, the same issue for \u and \U... Nevertheless, it is very interesting to read all this. That TIP601 was completly reverted by TIP 346. Your chapter 5.1 default mode totally violates TIP601 (to raise an error on any issue, when there is no "-nocomplain"). So, I think, TIP601 is gone and replaced by the two points: - default to accept any error (as in 8.7) - use strict to raise errors (as the default in TIP601) That is ok. It just shows, how the TCT wants it. Then, we can throw away TIP601 and replace "-nocomplain" by "-strict". I like your analysis of "-nocomplain", that it only affects codepoints outside the unicode range. --- Chapter 5.2, Case 1. (2nd paragraph): Text: " For examples, code points higher than U+00FF are not supported in the ASCII encoding". The part "U+00FF" should be "U+007F". --- - Definitions: You work a lot on definitions, what is great. A list may be added with TCL definitions including "TCL string", "TCL binary". Also, the following concepts may be explained: "BMP", "Surrogates", Encodings "utf-8", "utf-16", "CESU-8". - about "encoding binary". Kevin Kenny ones stated that "encoding binary" and "encoding iso8859-1" is the same (but translation, eof). This also enlighted me. --- It is quite hard to realize, that I work now for 8 months on this subject and we have a very inconsistend and contradictionary result. Well, we keep going... Thank you all, Harald Am 27.01.2023 um 16:36 schrieb apnmbx-public--- via Tcl-Core: > I’ve written up my view of “state of Unicode in Tcl 9” at > https://www.magicsplat.com/tcl9/tcl9unicode.html > <https://www.magicsplat.com/tcl9/tcl9unicode.html> > > My hope is that this will (a) serve as a tutorial for those not familiar > with the issues around Unicode (one-eyed leading the blind and all that) > and (b) prompt a broader discussion around the issues raised in the > mailing list and tickets. > > A summary TOC is below. I hope this prods more folks in the TCT (and > outside) to weigh in with their opinions one way or the other. > > Apologies for the length of the document but it’s not easy to summarise. > > /Ashok > > * 1 About this document > <https://www.magicsplat.com/tcl9/tcl9unicode.html#about-this-document> > * 2 Background > <https://www.magicsplat.com/tcl9/tcl9unicode.html#background> > * 3 Tcl strings > <https://www.magicsplat.com/tcl9/tcl9unicode.html#tcl-strings> > o 3.1 ASCII escape sequences for non-ASCII code points > <https://www.magicsplat.com/tcl9/tcl9unicode.html#ascii-escape-sequences-for-non-ascii-code-points> > o 3.2 Binary strings > <https://www.magicsplat.com/tcl9/tcl9unicode.html#binary-strings> > o 3.3 Issues in string definition > <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-string-definition> > + 3.3.1 No definition of what constitutes a Tcl string > <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-definition-of-what-constitutes-a-tcl-string> > + 3.3.2 Inconsistent handling for out of range code points > <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistent-handling-for-out-of-range-code-points> > + 3.3.3 Surrogates as literals > <https://www.magicsplat.com/tcl9/tcl9unicode.html#surrogates-as-literals> > + 3.3.4 Variable length escape sequences > <https://www.magicsplat.com/tcl9/tcl9unicode.html#variable-length-escape-sequences> > * 4 String commands > <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-commands> > o 4.1 String classification > <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-classification> > o 4.2 Issues in string commands > <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-string-commands> > + 4.2.1 string is unicode > <https://www.magicsplat.com/tcl9/tcl9unicode.html#string-is-unicode> > + 4.2.2 Nonconformant interpretation of string values > <https://www.magicsplat.com/tcl9/tcl9unicode.html#nonconformant-interpretation-of-string-values> > * 5 Encoding transforms > <https://www.magicsplat.com/tcl9/tcl9unicode.html#encoding-transforms> > o 5.1 Transforming encoded byte sequences to Tcl strings > <https://www.magicsplat.com/tcl9/tcl9unicode.html#transforming-encoded-byte-sequences-to-tcl-strings> > o 5.2 Transforming Tcl strings to encoded byte sequences > <https://www.magicsplat.com/tcl9/tcl9unicode.html#transforming-tcl-strings-to-encoded-byte-sequences> > o 5.3 Issues in encoding transforms > <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-encoding-transforms> > + 5.3.1 Only partial support for conforming error handling > behavior > <https://www.magicsplat.com/tcl9/tcl9unicode.html#only-partial-support-for-conforming-error-handling-behavior> > + 5.3.2 Error handling options are incomplete and inconsistent > <https://www.magicsplat.com/tcl9/tcl9unicode.html#error-handling-options-are-incomplete-and-inconsistent> > + 5.3.3 Default handling of invalid bytes is neither > conformant nor consistent > <https://www.magicsplat.com/tcl9/tcl9unicode.html#default-handling-of-invalid-bytes-is-neither-conformant-nor-consistent> > + 5.3.4 No support for lossless operation > <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-support-for-lossless-operation> > + 5.3.5 Default encoder handling should be strict conformance > <https://www.magicsplat.com/tcl9/tcl9unicode.html#default-encoder-handling-should-be-strict-conformance> > + 5.3.6 -failindex does not distinguish errors from incomplete > sequences > <https://www.magicsplat.com/tcl9/tcl9unicode.html#failindex-does-not-distinguish-errors-from-incomplete-sequences> > + 5.3.7 Inconsistency in default handling of surrogates > <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistency-in-default-handling-of-surrogates> > + 5.3.8 Inconsistency between error handling for different > encodings > <https://www.magicsplat.com/tcl9/tcl9unicode.html#inconsistency-between-error-handling-for-different-encodings> > + 5.3.9 Manpages for encoding have errors > <https://www.magicsplat.com/tcl9/tcl9unicode.html#manpages-for-encoding-have-errors> > * 6 Input and Output > <https://www.magicsplat.com/tcl9/tcl9unicode.html#input-and-output> > o 6.1 Input from channels > <https://www.magicsplat.com/tcl9/tcl9unicode.html#input-from-channels> > + 6.1.1 Blocking read > <https://www.magicsplat.com/tcl9/tcl9unicode.html#blocking-read> > + 6.1.2 Non-blocking read > <https://www.magicsplat.com/tcl9/tcl9unicode.html#non-blocking-read> > + 6.1.3 Blocking gets > <https://www.magicsplat.com/tcl9/tcl9unicode.html#blocking-gets> > + 6.1.4 Non-blocking gets > <https://www.magicsplat.com/tcl9/tcl9unicode.html#non-blocking-gets> > o 6.2 Output on channels > <https://www.magicsplat.com/tcl9/tcl9unicode.html#output-on-channels> > o 6.3 Binary channels > <https://www.magicsplat.com/tcl9/tcl9unicode.html#binary-channels> > o 6.4 File paths and system interfaces > <https://www.magicsplat.com/tcl9/tcl9unicode.html#file-paths-and-system-interfaces> > o 6.5 Issues in I/O and system interfaces > <https://www.magicsplat.com/tcl9/tcl9unicode.html#issues-in-io-and-system-interfaces> > + 6.5.1 Behavior of read violates defined semantics > <https://www.magicsplat.com/tcl9/tcl9unicode.html#behavior-of-read-violates-defined-semantics> > + 6.5.2 Channel read state after errors > <https://www.magicsplat.com/tcl9/tcl9unicode.html#channel-read-state-after-errors> > + 6.5.3 Channel write state after errors > <https://www.magicsplat.com/tcl9/tcl9unicode.html#channel-write-state-after-errors> > + 6.5.4 File and system APIs are not lossless > <https://www.magicsplat.com/tcl9/tcl9unicode.html#file-and-system-apis-are-not-lossless> > + 6.5.5 No error raised for conflicting options > <https://www.magicsplat.com/tcl9/tcl9unicode.html#no-error-raised-for-conflicting-options> |