|
From: Harald O. <har...@el...> - 2025-11-11 13:17:54
|
Hi Andreas, thank you for the pointer. The Wiki page is not correct any more, as TCL9 does not use CESU-8 internally any more. Neverhteless, the name "TUTF-8" is great for clarification in our documentation. In the case that the documentation change gets to 8.6, the CESU-8 issue should also be evaluated. Take care, Harald Am 11.11.2025 um 14:06 schrieb Andreas Kupries via Tcl-Core: > Note https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 <https:// > en.wikipedia.org/wiki/UTF-8#Modified_UTF-8> > > Java <https://en.wikipedia.org/wiki/Java_(programming_language)> > internally uses UTF-16 for the /char/ data type and, consequentially, > > the /Character/, /String/, and the /StringBuffer/ classes,^[61] > <https://en.wikipedia.org/wiki/UTF-8#cite_note-61> but > > for I/O uses /Modified UTF-8/ (MUTF-8), in which the null character > <https://en.wikipedia.org/wiki/Null_character> U+0000 > > uses the two-byte overlong encoding 0xC0, 0x80, instead of just > 0x00.^[18] <https://en.wikipedia.org/wiki/UTF-8#cite_note-:2-18> > And: > > Tcl <https://en.wikipedia.org/wiki/Tcl> also uses the same modified > UTF-8^[68] <https://en.wikipedia.org/wiki/UTF-8#cite_note-68> as Java > for internal > > representation of Unicode data, but uses strict CESU-8 for external data. > > (https://en.wikipedia.org/wiki/CESU-8 <https://en.wikipedia.org/wiki/ > CESU-8>) > > On Tue, Nov 11, 2025 at 2:00 PM Pietro Cerutti via Tcl-Core <tcl- > co...@li... <mailto:tcl...@li...>> wrote: > > On Nov 11 2025, 05:33 +0000, apnmbx-public--- via Tcl-Core <tcl- > co...@li... <mailto:tcl...@li...>> > wrote: > [-- Type: text/html; charset=utf-8, Encoding: quoted-printable, > Size: 4.9K --] > >The branch [1]apn-doc-update contains manpage updates addressing > two areas – > > > > > > > > ● added a section in Tcl.n that defines Tcl string value as a > sequence of > > Unicode code points. > > ● updates to various command and C API pages that wrongly > identify Tcl’s > > internal format as UTF-8. For this purpose the encoding name > TUTF-8 has > > been introduced to reference Tcl’s internal modified UTF-8 format. > > > > > > > >Reviews appreciated and improvements welcome. Both have been a pet > peeve with > >me for a long time (and probably no one else!) in that the first > is important > >missing information and the second is misinformation. > > Would it make sense to descibe TUTF-8 in its own dedicated man page and > referent to it, instead of duplicating the description across different > man pages? > |