From: Georg L. <jor...@ma...> - 2024-07-05 10:26:32
|
My 2c, Since always one had to handle the encoding and end-of-line translation explicitly for cross platform compatibility. What I understand from your contribution is, that this will remain so no matter if the ActiveCodePage manifest property is set or not, Does not surprise me, we are including legacy systems in our scope. New thing seems to be: If Windows users on newer systems already use UTF-8 as coding system, they'd benefit with respect to cross platform compatibility from the ActiveCodePage setting in tclsh/wish for all things done from now in the future . Seems like a win to me. Best Regards, Georg On 6/30/24 17:39, apnmbx-public--- via Tcl-Core wrote: > > I realize everyone is tired of encoding issues but I feel this is > important given the consequences ... > > One of the changes in Tcl 9 is to add a manifest property on Windows > which effectively sets [encoding system] to UTF-8 irrespective of the > user's code page setting. This manifest only has effect on Windows > versions Windows 10 Build 1903 and later. Earlier Windows systems > ignore the setting and Tcl 9 will use the user's code page setting on > those platforms. > > This change, presumably made as part of TIP 587, has consequences > serious enough to be reverted in my opinion. > > Below "file" refers to files containing non-ASCII content and that it > is read/written with the (default) system encoding. And "cannot be > read" means either an encoding error is thrown or data is garbled > depending on the combination of Tcl version, platform version and > system encoding (!). > > - A file written by a 8.x tclsh cannot be read by a 9.x tclsh and vice > versa even on *on the same exact system* on modern (build 1903 or > later) Windows systems. So for example, on US English (cp1252) systems > "set fd [open foo.txt w]; puts $fd \xa9; close $fd" in Tcl 8 will > result in foo.txt that will raise an error in Tcl 9's readFile (or the > longer open/read/close equivalent). This is not acceptable in my view. > > - Along the same lines, a file written by a 9.x tclsh on pre-1903 > Windows 10 cannot be read by the **same** 9.x tclsh on the **same** > system after a **Windows** update. Also not acceptable. > > - A file written by a 9.x tclsh on Windows 7 or 8 cannot be read by a > 9.x tclsh on Windows 10 1903 and vice versa even when the two share a > code page. May be less serious but still undesirable. > > - A third-party application (not using tclsh) that uses as its > scripting language will not be able to read files written by tclsh and > vice versa unless it also includes the utf-8 magic line in its > manifest. This latter is unlikely as it would raise the same > compatibility problems for them as for tclsh above. Also undesirable. > > In my view, these are all serious compatibility problems. > > The original motivation for the change in [encoding system] presumably > came from this page - > https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page. > However, note the target audience is really applications that use the > ANSI Api's and its usefulness is limited for applications (like Tcl) > that use the wide character API's. Moreover, as stated on that page, > if targeting platforms prior to Build 1903, "you must handle legacy > code page detection and conversion as usual". > > *Given the above, I would propose reverting [encoding system] on > Windows to go back to defaulting to the encoding as per the user's > code page setting as in Tcl 8.* Note this is not a code change, it is > a change to the manifest resource in tclsh (and wish) to remove the > ActiveCodePage property. The right place to move forward with UTF-8 as > the default encoding belongs to the *applications* written in Tcl, not > Tcl itself. > > (Note: I am not proposing changing the encoding used by the source > command) > > What would be drawbacks of reverting this change? One could argue that > it would reduce compatibility with other platforms like Unix which use > UTF-8 as the system encoding. However, I would argue that > > - this is no different from what's the case with 8.x today > > - sharing of files between platforms really has to include explicit > configuration of encoding as good practice and most cross-platform > applications will do that > > - the issues listed above for a single-platform, on the very same > system even, are much more serious > > Reverting would probably need to be TIP'ed but I wanted to get > feedback before embarking on that path. > > With regards to Unix, I do not have enough experience with encodings > on Unix to know if similar issues would arise there. The [encoding > system] in Tcl 9 has changed from iso8859-1 to utf-8. Someone should > examine this closer. > > Comments please. I’d prefer to avoid TIP overhead if someone can spot > holes in the above. > > /Ashok > > > > _______________________________________________ > Tcl-Core mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-core |