|
From: Jan N. <jan...@gm...> - 2023-01-27 21:22:05
|
Op vr 27 jan. 2023 om 16:37 schreef apnmbx-public: > I’ve written up my view of “state of Unicode in Tcl 9” at > https://www.magicsplat.com/tcl9/tcl9unicode.html > >From this document: % encoding convertto -strict ascii \uE0 ? The character U+E0 is not representable in ASCII so in the presence of the -strict option, the above command should have raised an exception Reading this, I was about to file a Bug report. But then I tried it: $ tclsh9.0 % encoding convertto -strict ascii \uE0 unexpected character at index 0: 'U+0000E0' So, I am wondering which version of Tcl 9.0 you were testing. My other remark is that you don't handle Tcl8.7. One of the challenges in the TIP's was to provide a smooth upgrade path from Tcl 8.6 through 8.7 to 9.0. Therefore, -strict sometimes does nothing in Tcl 9.0, which might be for 2 different reasons: 1) The encoding in Tcl 8.6 already did all the 'strict' checks, there's nothing to be added specifying -strict 2) Outputting lone surrogates is simply illegal in utf-8/-16/-32, therefore Tcl 9.0 always throws an exception whether -strict is specified or not. I'm sure that this reaction will trigger other reactions ..... that's fine. But I hope that real inconsistencies will lead to bug-reports, or (even better) patch submissions. You all are capable to report more complaints than I can handle. Many are simply wrong (in my opinion, but you have the right to disagree on that), but I'm doing my best to filter the real ones out if it. Reporting bugs multiple times doesn't help to get them fixed faster, neither duplicate them on the Tcl Core list and the Tcl chat. Have a nice weekend! Jan Nijtmans |
|
From: Jan N. <jan...@gm...> - 2023-01-27 22:08:36
|
Op vr 27 jan. 2023 om 22:21 schreef Jan Nijtmans:
> So, I am wondering which version of Tcl 9.0 you were testing.
>
Another example from the document:
For the second case, the default handling depends on the encoding being
used.
% encoding convertto ascii \uDC00
?
% encoding convertto utf-8 \uDC00
unexpected character at index 0: 'U+00DC00'
Notice how the treatment of surrogates differs between the two
encodings.
If I try this:
$ tclsh9.0
% encoding convertto ascii \uDC00
unexpected character at index 0: 'U+00DC00'
% encoding convertto utf-8 \uDC00
unexpected character at index 0: 'U+00DC00'
Again, which version of Tcl 9.0 are you testing?
Hope this helps,
Jan Nijtmans
|
|
From: <apn...@ya...> - 2023-01-28 08:26:47
|
Jan, Regarding the version of Tcl tested – I’m sure it was trunk (not any branch) but not sure about exact repository version. I updated my pool post write up so I’m not sure there’s a way to tell and not worth the time to bisect. But I tried it again downloading the following file: https://sourceforge.net/projects/magicsplat/files/barebones-tcl/tcl9.0-dev/tclsh9.0alpha-dev-snapshot-win-x64-sfe-20230117.exe/download which is the Jan 17 build artifact from the Tcl github actions. % encoding convertto -strict ascii \uE0 ? So I would assume that this behavior was exhibited at least as of Jan 17. Perhaps it has been fixed in the repository since then. Or your work tree has other changes not committed. I wanted to keep a stable pool while I plowed through my write-up so have not picked up the latest changes, if any. In any case, I think that issue is relatively minor and may be regarded simply as a bug to be fixed. I would prefer to see the larger issues in the document addressed. With regard to not handling Tcl 8.7, let me reiterate the purpose of my write-up. It expresses *my* views on how Tcl should behave wrt Unicode handling accompanied by a rationale and contrast it with how Tcl 9 actually behaves. The intent was to draw more people outside of yourself, Rolf, Harald and Nathan into the discussion. I don’t think filing tickets and bug reports accomplishes that. My belief is there should first be some consensus (or a majority opinion at least) on the desired *behavior* before looking into how it can be implemented in 9 or 8.7. (As an aside, your remark also reflects on the impact 8.7 has on 9.0 in terms of brain cycles consumed as well as a non-optimal end result.) With respect to bug reports and patches as you suggest, as I have stated several times before, I am happy to contribute patches and bug fixes. Neither I, nor others (Nathan already has branches implementing his opinions) expect you to do all the work (you’ve already done far more than can be expected and there would be no Tcl 9 without your effort, so thank you). But first there needs to be agreement on the issues I and others have raised. If you, and through their silence, the TCT have pretty much decided that the current -strict et al design (which for example, is not just a “bug fix”) is a done deal, there is no point in raising tickets or contributing patches. Or further TIP’s for that matter. /Ashok From: Jan Nijtmans <jan...@gm...> Sent: Saturday, January 28, 2023 2:52 AM To: tcl...@li... Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique Op vr 27 jan. 2023 om 16:37 schreef apnmbx-public: I’ve written up my view of “state of Unicode in Tcl 9” at https://www.magicsplat.com/tcl9/tcl9unicode.html >From this document: % encoding convertto -strict ascii \uE0 ? The character U+E0 is not representable in ASCII so in the presence of the -strict option, the above command should have raised an exception Reading this, I was about to file a Bug report. But then I tried it: $ tclsh9.0 % encoding convertto -strict ascii \uE0 unexpected character at index 0: 'U+0000E0' So, I am wondering which version of Tcl 9.0 you were testing. My other remark is that you don't handle Tcl8.7. One of the challenges in the TIP's was to provide a smooth upgrade path from Tcl 8.6 through 8.7 to 9.0. Therefore, -strict sometimes does nothing in Tcl 9.0, which might be for 2 different reasons: 1) The encoding in Tcl 8.6 already did all the 'strict' checks, there's nothing to be added specifying -strict 2) Outputting lone surrogates is simply illegal in utf-8/-16/-32, therefore Tcl 9.0 always throws an exception whether -strict is specified or not. I'm sure that this reaction will trigger other reactions ..... that's fine. But I hope that real inconsistencies will lead to bug-reports, or (even better) patch submissions. You all are capable to report more complaints than I can handle. Many are simply wrong (in my opinion, but you have the right to disagree on that), but I'm doing my best to filter the real ones out if it. Reporting bugs multiple times doesn't help to get them fixed faster, neither duplicate them on the Tcl Core list and the Tcl chat. Have a nice weekend! Jan Nijtmans |
|
From: Rolf A. <tcl...@po...> - 2023-01-29 19:10:41
|
apnmbx-public--- via Tcl-Core writes: > With regard to not handling Tcl 8.7, let me reiterate the purpose of > my write-up. It expresses *my* views on how Tcl should behave wrt > Unicode handling accompanied by a rationale and contrast it with how > Tcl 9 actually behaves. The intent was to draw more people outside of > yourself, Rolf, Harald and Nathan into the discussion. I don’t think > filing tickets and bug reports accomplishes that. My belief is there > should first be some consensus (or a majority opinion at least) on the > desired *behavior* before looking into how it can be implemented in 9 > or 8.7. (As an aside, your remark also reflects on the impact 8.7 has > on 9.0 in terms of brain cycles consumed as well as a non-optimal end > result.) Although I'm just one of the usual suspects allow me to thank Ashok for his work with this document. What I find especially important is that Ashok did a lot of definition work and describe cleary some of areas which needs *high level* decision about how thinks should work. > With respect to bug reports and patches as you suggest, as I have > stated several times before, I am happy to contribute patches and bug > fixes. Neither I, nor others (Nathan already has branches implementing > his opinions) expect you to do all the work (you’ve already done far > more than can be expected and there would be no Tcl 9 without your > effort, so thank you). But first there needs to be agreement on the > issues I and others have raised. If you, and through their silence, > the TCT have pretty much decided that the current -strict et al design > (which for example, is not just a “bug fix”) is a done deal, there is > no point in raising tickets or contributing patches. Or further TIP’s > for that matter. I second this. There is not much point in opening a ticket about a behaviour if it gets closed quickly. There's no other way than to discuss this here. rolf |
|
From: <apn...@ya...> - 2023-01-28 08:30:20
|
Here too, from the Jan 17 build artifact,
% encoding convertto ascii \uDC00
?
% encoding convertto utf-8 \uDC00
unexpected character at index 0: 'U+00DC00'
I thought there was a way to get the fossil commit id from script but can’t seem to find it.
/Ashok
From: Jan Nijtmans <jan...@gm...>
Sent: Saturday, January 28, 2023 3:38 AM
To: tcl...@li...
Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique
Op vr 27 jan. 2023 om 22:21 schreef Jan Nijtmans:
So, I am wondering which version of Tcl 9.0 you were testing.
Another example from the document:
For the second case, the default handling depends on the encoding being used.
% encoding convertto ascii \uDC00
?
% encoding convertto utf-8 \uDC00
unexpected character at index 0: 'U+00DC00'
Notice how the treatment of surrogates differs between the two encodings.
If I try this:
$ tclsh9.0
% encoding convertto ascii \uDC00
unexpected character at index 0: 'U+00DC00'
% encoding convertto utf-8 \uDC00
unexpected character at index 0: 'U+00DC00'
Again, which version of Tcl 9.0 are you testing?
Hope this helps,
Jan Nijtmans
|
|
From: Francois V. <fvo...@fr...> - 2023-01-28 17:02:37
|
Le 28/01/2023 à 09:26, apnmbx-public--- via Tcl-Core a écrit : > If you, and through their silence, the TCT have pretty much decided... I have now read several calls on the mailing list for more TCT members to enter this discussion. Taking this for myself, I wanted to precise that my silence (on any topic, BTW) does not mean anything special, especially neither approval nor disapproval. Specifically regarding the topic under discussion, I'm in no way proficient in the encoding matter currently discussed and I refrain from adding noise to the already complicated exchanges. I leave those who know so much more and better than me make the best decisions, enlightened by the entire community. Ashok made a great effort in his summary document, I hope this will be considered and will help moving forward. Plus my personal focus is on Tk, not really Tcl. Thank you for your understanding. Best regards, Francois |
|
From: Kevin W. <kw...@co...> - 2023-01-29 03:34:28
|
On 1/28/23 12:02 PM, Francois Vogel wrote: > Le 28/01/2023 à 09:26, apnmbx-public--- via Tcl-Core a écrit : >> If you, and through their silence, the TCT have pretty much decided... > > I have now read several calls on the mailing list for more TCT members > to enter this discussion. Taking this for myself, I wanted to precise > that my silence (on any topic, BTW) does not mean anything special, > especially neither approval nor disapproval. > > Specifically regarding the topic under discussion, I'm in no way > proficient in the encoding matter currently discussed and I refrain > from adding noise to the already complicated exchanges. I leave those > who know so much more and better than me make the best decisions, > enlightened by the entire community. Ashok made a great effort in his > summary document, I hope this will be considered and will help moving > forward. > > I would echo everything Francois says here. I have mostly been silent on these questions because they do not fall within my expertise. Having said this, I appreciate this detailed analysis that Ashok has prepared. Based on this, and the discussion by Rolf, Jan, PY and others, it does appear that there are a lot of inconsistencies in how Tcl handles encodings. It's likely these have accrued over time rather than as the result of an overall design philosophy. One point I am persuaded by from Ashok's discussion is that "-strict" should be the default in terms of conformance to the Unicode standard. This seems to be consistent with the rule of least surprise. I don't know what implications this has for script-level or C-level API's, or whether it is possible to implement this in time for 9.0, but now certainly seems the best time to discuss it. --Kevin |
|
From: Harald O. <har...@el...> - 2023-01-29 14:32:27
Attachments:
OpenPGP_signature
|
Am 29.01.2023 um 04:34 schrieb Kevin Walzer: > > On 1/28/23 12:02 PM, Francois Vogel wrote: >> Le 28/01/2023 à 09:26, apnmbx-public--- via Tcl-Core a écrit : >>> If you, and through their silence, the TCT have pretty much decided... >> >> I have now read several calls on the mailing list for more TCT members >> to enter this discussion. Taking this for myself, I wanted to precise >> that my silence (on any topic, BTW) does not mean anything special, >> especially neither approval nor disapproval. >> >> Specifically regarding the topic under discussion, I'm in no way >> proficient in the encoding matter currently discussed and I refrain >> from adding noise to the already complicated exchanges. I leave those >> who know so much more and better than me make the best decisions, >> enlightened by the entire community. Ashok made a great effort in his >> summary document, I hope this will be considered and will help moving >> forward. >> >> > I would echo everything Francois says here. I have mostly been silent on > these questions because they do not fall within my expertise. > > Having said this, I appreciate this detailed analysis that Ashok has > prepared. Based on this, and the discussion by Rolf, Jan, PY and others, > it does appear that there are a lot of inconsistencies in how Tcl > handles encodings. It's likely these have accrued over time rather than > as the result of an overall design philosophy. > > One point I am persuaded by from Ashok's discussion is that "-strict" > should be the default in terms of conformance to the Unicode standard. > This seems to be consistent with the rule of least surprise. I don't > know what implications this has for script-level or C-level API's, or > whether it is possible to implement this in time for 9.0, but now > certainly seems the best time to discuss it. Dear Francois, Kevin, thank you for writing this, I appreciate. IMHO, the main issue is, that the TIPs 346/601 are contradictory and only the TCT can save us from that. It highly depends, on which order they are applied. In addition, TCL 8.7 is highly incompatible to 9.0, depending on the application order. And TIP807 is incomplete. We will see where we end. I personally have no clue. A higher engagement by the TCT (for instance, only Jan is active (I appreciate)) would be great. The opinion of Jan is that, on TIP level, all is ok. This must be accepted (so far). Thank you all, Harald |
|
From: Rolf A. <tcl...@po...> - 2023-01-29 18:11:27
|
apnmbx-public--- via Tcl-Core writes: > Here too, from the Jan 17 build artifact, > > % encoding convertto ascii \uDC00 > ? > > % encoding convertto utf-8 \uDC00 > unexpected character at index 0: 'U+00DC00' > > Op vr 27 jan. 2023 om 22:21 schreef Jan Nijtmans: > So, I am wondering which version of Tcl 9.0 you were testing. It's the commit https://core.tcl-lang.org/tcl/info/515bfbe816ef7b13 from 2023-01-19 which makes the difference here. Prior to that the examples work (wrongly) as Ashok reported, since this commit this is fixed as Jan showed. Btw, @Francois, that code change which fixed the behaviour here had its root in the ticket https://core.tcl-lang.org/tk/tktview?name=370b1ff03e - getting the things discussed here right is of course important also for a working Tk. rolf |
|
From: Jan N. <jan...@gm...> - 2023-01-29 19:13:27
|
Op zo 29 jan. 2023 om 19:11 schreef Rolf Ade: > It's the commit https://core.tcl-lang.org/tcl/info/515bfbe816ef7b13 from > 2023-01-19 which makes the difference here. > > Prior to that the examples work (wrongly) as Ashok reported, since this > commit this is fixed as Jan showed. > Another commit to be noted is this one from 2023-01-22: <https://core.tcl-lang.org/tcl/info/57baf6fc1f334b3d> It fixes this ticket: <https://core.tcl-lang.org/tcl/info/a31caff057> Since those 2 commits fix inconsistencies in the use of -strict, it would be useful to have Ashok's document updated, checking whether all inconsistencies reported regarding the use of "-strict" are gone now. It doesn't make sense starting a discussion on making "-strict" the default in Tcl 9.0, if there's still a discussion on what -strict should do. One thing is for sure: When using '-strict' (without -failindex), an exception should be thrown for any 'illegal' bytes or code-points. I don't want to discuss 'illegal': That's different for every encoding (although it should be clear for utf-8/-16/-32). Not throwing an exception when using -strict and encountering 'illegal' bytes or code-points, that's a bug. Please report it (unless there's already a ticket for it), and - even better - provide a test-case and/or patch. Do we have an agreement on what '-strict' is supposed to do? See also: https://core.tcl-lang.org/tips/doc/trunk/tip/346.md Regards, Jan Nijtmans |
|
From: Rolf A. <tcl...@po...> - 2023-01-30 00:36:46
|
"Kevin Walzer" <kw-...@pu...> writes: > On 1/28/23 12:02 PM, Francois Vogel wrote: >> Le 28/01/2023 à 09:26, apnmbx-public--- via Tcl-Core a écrit : >>> If you, and through their silence, the TCT have pretty much decided... >> >> I have now read several calls on the mailing list for more TCT >> members to enter this discussion. Taking this for myself, I wanted >> to precise that my silence (on any topic, BTW) does not mean >> anything special, especially neither approval nor disapproval. >> >> Specifically regarding the topic under discussion, I'm in no way >> proficient in the encoding matter currently discussed and I refrain >> from adding noise to the already complicated exchanges. I leave >> those who know so much more and better than me make the best >> decisions, enlightened by the entire community. Ashok made a great >> effort in his summary document, I hope this will be considered and >> will help moving forward. >> >> > I would echo everything Francois says here. I have mostly been silent > on these questions because they do not fall within my expertise. I only partly buy into this argument. Most, if not all, discussions about this stuff in the last 10 or so weeks are not about details of implementation (which indeed should have its place in tickets) but how things work and how they should work from a much higher level. To explain better what I mean let me take as example the Tcl 9 script level behaviour of the read and gets commands. Several times I tried to raise this on tcl-core and Ashok also discuss this in his paper. After some repetition by me this topic in fact got attention (and in the meantime even two branches with implementations of alternatives by Nathan). But all of the TCT members stayed completely away from this - why is this? Again, this is not about implementation detail or if this byte sequence read from a channel with this encoding should result in that byte sequence if written out to a channel with that encoding. It is about the semantic of important and familiar commands. I assume you have an option about that? Even in cases the "do not fall within my expertise" argument may have some weight there is another level on which I miss presence and activity of the TCT members - let's name it moderation. Despite the details it typically is clearly shows if there is fundamental disagreement. At that point someone other should try to help sorting things out. If a "nobody" like me has a discussion with a TCT member this is a discussion between unequals (to avoid words like "uphill battle", we hopefully don't have battles and fights but try to find better solutions). We lately had a daunting example. Another thought: We are close to have a first 9.0 beta. Tcl 9.0 has a longer history with several major contributors but it is obviously true than Jan has put tremendous work into it within the last year to push it forward to that "we're close to beta" state. But this make him perhaps not the best one to judge about the issues of the 9.0 changes raised by the early adaptors - here your option and experience is called, TCT members. Raising discussion about obviously problematic parts prior to beta release should be welcome. Unfortunately it feels more like Sisyphus rolling its stone. rolf |
|
From: Steve L. <st...@di...> - 2023-01-30 01:16:54
|
But all of the TCT members stayed completely away from this - why isthis?With respect Rolf, I have been trying to moderate the discussion both on TCLCORE and on tkchat as best as I can. The fact that people whose opinions I value have been in apparent conflict has concerned me greatly and I wouldn't be surprised if other TCT members felt the same.Also, those of us on tkchat have been aware that Ashok has been working on his document for some time. Now that it is here we are in a better position to understand the nuances of the arguments. Perhaps the time has come for a "face to face" discussion on the outstanding issues. I realise that may be less than ideal for those less comfortable with spoken English but at least it might enable all interested parties to try and make progress towards a consensus as to the way forward. If this is something that you, Jan, Ashok, Nathan, Harald and others support then we could schedule a meeting at a suitable time using the Zoom channel. If we go down this path I would encourage all TCT members to be present if possible.At 8:37 am on 30 Jan 2023, Rolf Ade <tcl...@po...> wrote:"Kevin Walzer" <kw-...@pu...> writes:On 1/28/23 12:02 PM, Francois Vogel wrote:Le 28/01/2023 à 09:26, apnmbx-public--- via Tcl-Core a écrit :If you, and through their silence, the TCT have pretty much decided...I have now read several calls on the mailing list for more TCTmembers to enter this discussion. Taking this for myself, I wantedto precise that my silence (on any topic, BTW) does not meananything special, especially neither approval nor disapproval.Specifically regarding the topic under discussion, I'm in no wayproficient in the encoding matter currently discussed and I refrainfrom adding noise to the already complicated exchanges. I leavethose who know so much more and better than me make the bestdecisions, enlightened by the entire community. Ashok made a greateffort in his summary document, I hope this will be considered andwill help moving forward.I would echo everything Francois says here. I have mostly been silenton these questions because they do not fall within my expertise.I only partly buy into this argument. Most, if not all, discussionsabout this stuff in the last 10 or so weeks are not about details ofimplementation (which indeed should have its place in tickets) but howthings work and how they should work from a much higher level.To explain better what I mean let me take as example the Tcl 9 scriptlevel behaviour of the read and gets commands. Several times I tried toraise this on tcl-core and Ashok also discuss this in his paper.After some repetition by me this topic in fact got attention (and in themeantime even two branches with implementations of alternatives byNathan).But all of the TCT members stayed completely away from this - why isthis? Again, this is not about implementation detail or if this bytesequence read from a channel with this encoding should result in thatbyte sequence if written out to a channel with that encoding. It is aboutthe semantic of important and familiar commands. I assume you have anoption about that?Even in cases the "do not fall within my expertise" argument may havesome weight there is another level on which I miss presence and activityof the TCT members - let's name it moderation. Despite the details ittypically is clearly shows if there is fundamental disagreement.At that point someone other should try to help sorting things out. If a "nobody" like me has a discussion with a TCT member this is adiscussion between unequals (to avoid words like "uphill battle", wehopefully don't have battles and fights but try to find bettersolutions). We lately had a daunting example.Another thought: We are close to have a first 9.0 beta. Tcl 9.0 has alonger history with several major contributors but it is obviously truethan Jan has put tremendous work into it within the last year to pushit forward to that "we're close to beta" state. But this make him perhaps not the best one to judge about the issuesof the 9.0 changes raised by the early adaptors - here your option andexperience is called, TCT members.Raising discussion about obviously problematic parts prior to betarelease should be welcome. Unfortunately it feels more like Sisyphusrolling its stone.rolf_______________________________________________Tcl-Core mailing lis...@li...://lists.sourceforge.net/lists/listinfo/tcl-core |
|
From: Poor Y. <org...@po...> - 2023-01-30 10:16:00
|
On 2023-01-30 03:16, Steve Landers wrote: ... > > If this is something that you, Jan, Ashok, Nathan, Harald and others > support then we could schedule a meeting at a suitable time using the > Zoom channel. If we go down this path I would encourage all TCT > members to be present if possible. > It does seem that lack of communication, or lack of constructive communication, is one of the issues. Some fact-to-face meetings could be the key, and more ongoing communication, either on this mailing list or on #tcl could also be helpful. For me the mailing list is higher-friction thatn #tcl, because I don't want to send out a mail unless I've done a good amount of thinking about it first. Sometimes that's the right thing, but sometimes brainstorming together is useful, and that doesn't happen so much on the mailing list. Another thing is that TIPs are useful for voting purposes, but not always so useful for working though trouble spots during development. I would propose that this issue could be solved by working on a branch off trunk dedicated to it until everyone is happy with that branch. It could be one of the two I've created for the purpose, or a new one started by someone else. Such a branch could also provide something constructive to talk about, and maybe encourage people that don't always directly hack on Tcl to jump in too. One reason I propose a branch off trunk is that I think Tcl 9 should be relatively free of the baggage needed in 8.7, but when working forward from 8.7 to 9.0, that baggage also has a tendency to move forward. -- Yorick |
|
From: <apn...@ya...> - 2023-02-01 11:49:25
|
Fine by me. From: Steve Landers <st...@di...> Perhaps the time has come for a "face to face" discussion on the outstanding issues. I realise that may be less than ideal for those less comfortable with spoken English but at least it might enable all interested parties to try and make progress towards a consensus as to the way forward. If this is something that you, Jan, Ashok, Nathan, Harald and others support then we could schedule a meeting at a suitable time using the Zoom channel. If we go down this path I would encourage all TCT members to be present if possible. |
|
From: Harald O. <har...@el...> - 2023-01-30 08:04:51
Attachments:
OpenPGP_signature
|
Am 30.01.2023 um 02:16 schrieb Steve Landers: > >> But all of the TCT members stayed completely away from this - why is >> this? > > With respect Rolf, I have been trying to moderate the discussion both on > TCLCORE and on tkchat as best as I can. The fact that people whose > opinions I value have been in apparent conflict has concerned me greatly > and I wouldn't be surprised if other TCT members felt the same. > > Also, those of us on tkchat have been aware that Ashok has been working > on his document for some time. Now that it is here we are in a better > position to understand the nuances of the arguments. > > Perhaps the time has come for a "face to face" discussion on the > outstanding issues. I realise that may be less than ideal for those less > comfortable with spoken English but at least it might enable all > interested parties to try and make progress towards a consensus as to > the way forward. > > If this is something that you, Jan, Ashok, Nathan, Harald and others > support then we could schedule a meeting at a suitable time using the > Zoom channel. If we go down this path I would encourage all TCT members > to be present if possible. +1 from my side. Any initiative to move out of the dead end road is great! For me, there are the following general decisions: D1) encoding error reporting mode: remove "-nocomplain" (TIP601) or "-strict" (TIP 346) (or, limit strict to modify the encoding and not modify the reporting mode) D2) define the TCL 8.7 and TCL 9.0 default encoding error reporting mode (TIP601 and TIP 346 are contradictious). This results from D1) D3) extend TIP607 to feature a fail reason reporting (incomplete sequence or encoding error) Thank you and take care, Harald |
|
From: Christian G. <aur...@gm...> - 2023-01-30 09:32:39
|
Am 30.01.23 um 09:04 schrieb Harald Oehlmann:
> For me, there are the following general decisions:
> D1) encoding error reporting mode: remove "-nocomplain" (TIP601) or
> "-strict" (TIP 346) (or, limit strict to modify the encoding and not
> modify the reporting mode)
When we talk about renaming these options, how about an "error mask"?
I.e. a way to specify on what errors the encoding shold stop. e.g.
-erroron {surrogates invalid wrongcode} ....
where these names should be the ones defined by the UNicode consortium.
Then everyone can pick their own failuremode. The same codes should then
also be reported when the error occurs.
Which one of these shall become the default, is then purely bikeshedding.
Christian
|
|
From: Rolf A. <tcl...@po...> - 2023-01-30 11:30:55
|
Christian Gollwitzer writes:
> Am 30.01.23 um 09:04 schrieb Harald Oehlmann:
>> For me, there are the following general decisions:
>> D1) encoding error reporting mode: remove "-nocomplain" (TIP601) or
>> "-strict" (TIP 346) (or, limit strict to modify the encoding and not
>> modify the reporting mode)
>
> When we talk about renaming these options, how about an "error mask"?
> I.e. a way to specify on what errors the encoding shold stop. e.g.
>
> -erroron {surrogates invalid wrongcode} ....
>
> where these names should be the ones defined by the UNicode consortium.
> Then everyone can pick their own failuremode. The same codes should then
> also be reported when the error occurs.
Haven't said much about this part of the discussion here on tcl-core (I
did on the tclers chat), because talking about too much different things at
once would just add more confusion to an obviously complex discussion.
But if the TCT would decide to open the discussion about the
functionality in question here - which I would welcome - then I surely
would argue for something along the lines Christian proposes here.
rolf
|
|
From: <apn...@ya...> - 2023-02-01 11:14:29
|
I’ve updated the document at Unicode in Tcl 9 (magicsplat.com) <https://www.magicsplat.com/tcl9/tcl9unicode.html> and included the fossil commit id for reference. I agree that whether -strict is the default or not is a secondary question. The primary question to be answered is whether the combination of default behaviour, -strict and -nocomplain cover required error handling behaviors. My answer is no. When invalid byte sequences are encountered, at least the following behaviors are possibly desired in theory: 1. Treat as an error (either by raising an exception or via the -failindex mechanism) 2. Replace with an encoding-specific character in the target encoding (U+FFFD, question mark etc.) 3. Replace with a lossless internal representation (specific use cases filenames, environment vars, system apis) 4. Replace with a user-defined character 5. Replace with the numerically equivalent code point (Tcl8 behavior and current default) 6. Discard the byte(s) (seen as an option in Python etc.) As per the Unicode standard, options 1 and 2 are conformant. Option 3 is semi-blessed (as in recommended for specific use cases as discussed in the write up). Tcl 9 implements Option 1 (-strict) and 5 (implicit default, albeit with some caveats for out-of-range values). I believe it is important to support (2) and (3); the former because applications expect it, latter because it allows for correct operation when interacting with the system (see write up). (5) and (6) are in my opinion broken behaviors but let us assume (5) at least is mandated for Tcl 8 compatibility. Now the point of discussion may be: If you think standard conformant (2) and (3) are not useful, now or in the future, then that becomes the point of debate. The argument is whether (1) and (5) suffice for all time to come. However, if you agree (2) and (3) are useful, or that other behaviors may be desirable in the future, the discussion becomes how best to add them in 9.0 or 9.1. In the current -strict/-nocomplain model, one would likely have to add -replace (2), -lossless (3), -discard (6) etc and the equivalent -encodinglossless 0/1, -encodingreplace 0/1 etc. to fconfigure. Obviously, mutually exclusive. This is confusing and not good design to have multiple mutually exclusive options. Following TIP 654, the model if not the specifics, we would instead have -profile strict, fconfigure -encodingprofile strict for (1) -profile replace, fconfigure -encodingprofile replace (2) -profile lossless, fconfigure -encodingprofile lossless (3) -profile \UXXXX, fconfigure -encodingprofile \UXXXX (4) (meh, not sure I like that) -profile tcl8, fconfigure -encodingprofile tcl8 (5) -profile discard, fconfigure -encodingprofile discard (6) etc. which I think is a much cleaner, more extensible interface. /Ashok From: Jan Nijtmans <jan...@gm...> Since those 2 commits fix inconsistencies in the use of -strict, it would be useful to have Ashok's document updated, checking whether all inconsistencies reported regarding the use of "-strict" are gone now. It doesn't make sense starting a discussion on making "-strict" the default in Tcl 9.0, if there's still a discussion on what -strict should do. One thing is for sure: When using '-strict' (without -failindex), an exception should be thrown for any 'illegal' bytes or code-points. I don't want to discuss 'illegal': That's different for every encoding (although it should be clear for utf-8/-16/-32). Not throwing an exception when using -strict and encountering 'illegal' bytes or code-points, that's a bug. Please report it (unless there's already a ticket for it), and - even better - provide a test-case and/or patch. Do we have an agreement on what '-strict' is supposed to do? See also: https://core.tcl-lang.org/tips/doc/trunk/tip/346.md Regards, Jan Nijtmans |
|
From: <apn...@ya...> - 2023-02-01 11:45:21
|
A comment on Christian's -erroron mask suggestion.
-erroron would define what constitutes an error. But it does not say what
should be done in case of that error which I think is the more important
issue to address.
So for example, if \xC0 is encountered in [encoding convertfrom utf-8],
should that be mapped to U+00C0, mapped to U+FFFD, raise an exception etc. I
think that is more important than distinguishing between error cases like
surrogate in utf-8 vs \xC0 in utf-8.
So while it may have some use, it doesn't really address the current
discussion.
/Ashok
> -----Original Message-----
> From: Christian Gollwitzer <aur...@gm...>
> When we talk about renaming these options, how about an "error mask"?
> I.e. a way to specify on what errors the encoding shold stop. e.g.
>
> -erroron {surrogates invalid wrongcode} ....
>
> where these names should be the ones defined by the UNicode consortium.
> Then everyone can pick their own failuremode. The same codes should then
> also be reported when the error occurs.
>
> Which one of these shall become the default, is then purely bikeshedding.
>
>
> Christian
>
>
> _______________________________________________
> Tcl-Core mailing list
> Tcl...@li...
> https://lists.sourceforge.net/lists/listinfo/tcl-core
|
|
From: Jan N. <jan...@gm...> - 2023-02-01 13:21:49
|
Op wo 1 feb. 2023 om 12:14 schreef apnmbx-public--- via Tcl-Core < tcl...@li...>: > I’ve updated the document at Unicode in Tcl 9 (magicsplat.com) > <https://www.magicsplat.com/tcl9/tcl9unicode.html> and included the > fossil commit id for reference. > Thanks! > I agree that whether -strict is the default or not is a secondary question. > OK. Let's set that aside for now. If you think standard conformant (2) and (3) are not useful, now or in the > future, then that becomes the point of debate. The argument is whether (1) > and (5) suffice for all time to come. > I'm not saying that (2) and (3) are not useful. They are not on my radar (never were), but I won't stand in the way if someone wants to implement it. > In the current -strict/-nocomplain model, one would likely have to add > -replace (2), -lossless (3), -discard (6) etc > > and the equivalent -encodinglossless 0/1, -encodingreplace 0/1 etc. to > fconfigure. Obviously, mutually exclusive. This is confusing and not good > design to have multiple mutually exclusive options. > Agreed that it's not a good idea to add more and more flags. Personally, I don't like -nocomplain, but I see the need for it. I would be in favor of deprecating it (in Tcl 9.1) and removing it (in Tcl 10.0), but now it's no time for dreaming ;-) Following TIP 654, the model if not the specifics, we would instead have > > > > -profile strict, fconfigure -encodingprofile strict for (1) > > -profile replace, fconfigure -encodingprofile replace (2) > > -profile lossless, fconfigure -encodingprofile lossless (3) > > -profile \UXXXX, fconfigure -encodingprofile \UXXXX (4) (meh, not sure I > like that) > > -profile tcl8, fconfigure -encodingprofile tcl8 (5) > > -profile discard, fconfigure -encodingprofile discard (6) > > > > etc. > > > > which I think is a much cleaner, more extensible interface. > I could imagine TIP #654 (or something like this) being implemented in the future. Adding more flags is not a good idea, I agree, something like TIP #654 is much cleaner. The -strict/-nocomplain model is not meant to be complete, to be used until the end of time. It's simple and serves what most people (least-surprise rule) expect (I hope). There is one problem with accepting TIP #654, and that's the availability of an implementation. Can it be added later? Yes, I think so, just adding a new "-profile" option and making "-strict" synonym with "-profile strict". That sounds simple. In practice, it isn't. The Tcl channel system is quite complicated, showing its age. Some day it should be replaced, e.g. by ICU or so. The profiles have to be implemented by _all_ encodings, that's a hell of a job. I don't see it happening any time soon. If someone would start implementing TIP #654, I'm there offering my help. But I'm afraid it would take too much time. People want Tcl 8.7/9.0 tomorrow, they already waited too much time. That's the challenge we are facing. Hope this helps, Jan Nijtmans |
|
From: <apn...@ya...> - 2023-02-01 15:22:38
|
Thanks, I have a better understanding of your position now.
I can imagine implementing all the profiles would be significant work and risk. However, wouldn’t just mapping (at the option parsing level),
-profile strict -> flags for -strict
-profile “tcl8” -> implicit default flags
-profile nocomplain -> -nocomplain flags (although I would prefer just getting rid of this option)
(and fconfigure equivalents) be fairly straightforward? It would then permit adding other profiles for 9.1 without any -strict, -nocomplain baggage in the future. Those would require deeper changes in the encoding and may be channel implementations.
If this seems a reasonable compromise, I can take a stab at it unless Nathan already has.
/Ashok
From: Jan Nijtmans <jan...@gm...>
There is one problem with accepting TIP #654, and that's the
availability of an implementation. Can it be added later? Yes,
I think so, just adding a new "-profile" option and making "-strict"
synonym with "-profile strict". That sounds simple. In practice,
it isn't. The Tcl channel system is quite complicated, showing
its age. Some day it should be replaced, e.g. by ICU or so.
The profiles have to be implemented by _all_ encodings,
that's a hell of a job. I don't see it happening any time soon.
If someone would start implementing TIP #654, I'm there
offering my help. But I'm afraid it would take too much
time. People want Tcl 8.7/9.0 tomorrow, they already
waited too much time. That's the challenge we
are facing.
Hope this helps,
Jan Nijtmans
|
|
From: <apn...@ya...> - 2023-02-01 15:26:58
|
One more thought just came to mind. Adding support for the “lossless” encoding and modifying file name commands, open, environment access etc. to use it would result in a change in the default encoding used for those commands. I don’t know if that was postponed to 9.1, it would be treated as an incompatible change between 9.0 and 9.1.
/Ashok
From: apnmbx-public--- via Tcl-Core <tcl...@li...>
Sent: Wednesday, February 1, 2023 8:52 PM
To: tcl...@li...
Subject: Re: [TCLCORE] Unicode in Tcl 9 - a commentary and critique
Thanks, I have a better understanding of your position now.
I can imagine implementing all the profiles would be significant work and risk. However, wouldn’t just mapping (at the option parsing level),
-profile strict -> flags for -strict
-profile “tcl8” -> implicit default flags
-profile nocomplain -> -nocomplain flags (although I would prefer just getting rid of this option)
(and fconfigure equivalents) be fairly straightforward? It would then permit adding other profiles for 9.1 without any -strict, -nocomplain baggage in the future. Those would require deeper changes in the encoding and may be channel implementations.
If this seems a reasonable compromise, I can take a stab at it unless Nathan already has.
/Ashok
From: Jan Nijtmans <jan...@gm... <mailto:jan...@gm...> >
There is one problem with accepting TIP #654, and that's the
availability of an implementation. Can it be added later? Yes,
I think so, just adding a new "-profile" option and making "-strict"
synonym with "-profile strict". That sounds simple. In practice,
it isn't. The Tcl channel system is quite complicated, showing
its age. Some day it should be replaced, e.g. by ICU or so.
The profiles have to be implemented by _all_ encodings,
that's a hell of a job. I don't see it happening any time soon.
If someone would start implementing TIP #654, I'm there
offering my help. But I'm afraid it would take too much
time. People want Tcl 8.7/9.0 tomorrow, they already
waited too much time. That's the challenge we
are facing.
Hope this helps,
Jan Nijtmans
|
|
From: Jan N. <jan...@gm...> - 2023-02-01 16:06:32
|
Op wo 1 feb. 2023 om 16:27 schreef apnmbx-public--- via Tcl-Core:
> One more thought just came to mind. Adding support for the “lossless”
> encoding and modifying file name commands, open, environment access etc. to
> use it would result in a change in the default encoding used for those
> commands. I don’t know if that was postponed to 9.1, it would be treated as
> an incompatible change between 9.0 and 9.1.
>
Let me think on that
Hope this helps,
Jan Nijtmans
|
|
From: Jan N. <jan...@gm...> - 2023-02-01 15:53:13
|
Op wo 1 feb. 2023 om 16:22 schreef apnmbx-public--- via Tcl-Core:
> -profile strict -> flags for -strict
>
> -profile “tcl8” -> implicit default flags
>
> -profile nocomplain -> -nocomplain flags (although I would prefer just
> getting rid of this option)
>
Then I would suggest:
-profile strict -> flags for -strict
-profile tcl8 -> flags for -nocomplain (since this is the
default for Tcl 8)
-profile {} -> no flags (default for Tcl 9)
> (and fconfigure equivalents) be fairly straightforward?
>
Indeed
Regards,
Jan Nijtmans
|