|
From: Poor Y. <org...@po...> - 2023-02-03 21:39:10
|
On 2023-02-03 18:19, Donald G Porter via Tcl-Core wrote: > On 1/27/23 10:36, apnmbx-public--- via Tcl-Core wrote: >> > > Both 2) and 3) may impose constraints and demand revision to the > Tcl_Filesystem interface and its Tcl_FSMatchInDirectoryProc slot. The > encoding to be used to interpret the bytes of a filename might better > be an attribute of a Tcl_Filesystem or of a mount point rather than an > application-wide (and not thread-stable?) notion of a system encoding > pulled in through a side channel. Even this won't solve the problem. Posix filesystems don't maintain a known encoding as part of their configuration. An ext4 filesystem mounted at root may have filenames encoded in utf-8, and then another ext4 filesysem mounted somewhere else might have filenames encoded in another encoding. No matter what encoding Tcl attributes to this combined set of files, it's going to be wrong at some point. > > For 1) if the alphabet for Tcl strings is larger than unicode scalar > values, that provides a clear use and meaning for [string is unicode] > which has puzzled some people. Maybe a change to [string is usv] would > be clearer to the reader that the test is whether symbols outside the > set of unicode scalar values are present. These are symbols that > cannot be properly encoded in the Unicode encodings utf-8, utf-16, > utf-32. There's no need for [string is unicode], [string is usv], [string is iso8859-1], [string is shiftjis], or any [string is any_other_encoding]. [encoding convertto] and [encoding convertfrom] already adequately cover this functionality. See TIP 652. -- Yorick |