From: Harald O. <har...@el...> - 2023-01-09 13:35:54
Attachments:
OpenPGP_signature
|
Dear Donal, dear all, thank you for your valuable opinion! For me, the ZIP file name is a special case. I suppose, a good heuristic would be: - check for printable ASCII -> ok - Use UTF-8, check for correct encoding and printable characters - Use ISO8859-1 and check for printable characters - do other checks. In my case, the default windows western codepage cp1252 is also a good candidate. Or we check for ASCII/UTF-8 printable/current system encoding ? Or we specify the encoding. Anyway, the new encoding options to throw an error gives us an additional tool to work with this issue. --- Please allow me to wish a happy and peaceful new year to you all. Steve, is there a TCL monthly meetup tomorrow ? --- I also appreciate all the work on the channel system. It is great and complicated work. --- Thank you all, Harald Am 09.01.2023 um 13:31 schrieb Donal Fellows: > I'm of the opinion that we should keep I/O largely compatible with 8 if possible, as anything else will be a total pain for users (and one that is comparatively hard for them to check for). Breaking things like this would need to be justified, and "it's a major version change so we're allowed to do it" isn't enough justification by itself. That said, if someone explicitly specifies an option that requests behaviour that was not previously possible, that's totally OK; they volunteered to have the incompatibility. A -strict option (or a utf8-strict encoding, or however it is written) is fine, provided it is not applied to any channel by default. That specifically includes the standard channels. > > The main use case for a failure to convert to/from an encoding is when the application has a non-standard way of handling this: in the case of the code that opened the can of worms in the first place, it was the (awful!) convention inside ZIP files where strings can be either UTF-8 or ISO 8859-1 (or whatever random vendor encoding was in use on the generating system!) without metadata indicating what was actually used. In that case, the default strategy of 8, replacing impossible characters with ? or U+00FFFD or whatever (as specified by the encoding), is unsuitable. To me, that smacks of being the sort of thing you apply when using the encoding command explicitly, not in I/O channels or working with filenames. But someone opting in for a channel is still fine. > > Donal. > > ________________________________________ > From: apnadkarni--- via Tcl-Core <tcl...@li...> > Sent: 31 December 2022 07:26 > To: tcl...@li... > Subject: Re: [TCLCORE] More on I/O with Tcl 9 > > I’m not sure I understand the term critiquing the data. > > At a conceptual level, I understand the separation between i/o errors and content errors. But consider that Tcl 8 is already looking at content when it does encoding transforms, cr-lf translation, ^Z eof processing etc. And once it does encoding transforms, there has to be some mechanism for dealing with invalid encodings. Tcl 8 blithely ignored these errors. Tcl 9 does not (at least with -strict) and with good reason. And once it detects encoding errors at the channel, there must be some mechanism to convey this to the application. > > One could argue that channels should stick to i/o and content processing should be done via [encoding convert*] but that would be (a) a monumental change from Tcl 8 and (b) have very negative implications in terms of both efficiency and convenience in processing streaming data. > > /Ashok > > From: Steve Landers <st...@di...> > Sent: Saturday, December 31, 2022 6:34 AM > To: bch <bra...@gm...>; Brian Griffin <bri...@ea...> > Cc: tcl...@li... > Subject: Re: [TCLCORE] More on I/O with Tcl 9 > > On 31 Dec 2022 at 8:50 AM +0800, Brian Griffin <bri...@ea...<mailto:bri...@ea...>>, wrote: > > On Dec 30, 2022, at 3:19 PM, bch <bra...@gm...<https://mailto:bra...@gm...>> wrote: > > > Apologies in advance; I’ve got no ideas to contribute at the moment, and might also simply be off-base. With that out of the way - > Are we getting to close to the developer (the Joe or Jane Smith writing some app in Tcl) having to know more of the implementation details of Tcl I/O than they should? I offer this as a genuine question. This question is either a reality-check, ignorant and inconsequential or somewhere in between I suppose. Looking forward to finding out. > -bch > > +1 > I have a similar concern. It feels generally wrong to me that the I/O system is critiquing the data. The only errors I/O operations should report are channel failures, not content failures. > > -Brian > > +1 from me. > > > _______________________________________________ > Tcl-Core mailing list > Tcl...@li... > https://lists.sourceforge.net/lists/listinfo/tcl-core -- ELMICRON Dr. Harald Oehlmann GmbH Koesener Str. 85 06618 NAUMBURG - Germany Phone: +49 3445 781120 Direct: +49 3445 781127 www.Elmicron.de German legal references: Geschaeftsfuehrer: Dr. Harald Oehlmann UST Nr. / VAT ID No.: DE206105272 HRB 212803 Stendal |
From: Poor Y. <org...@po...> - 2023-01-10 09:35:44
|
On 2023-01-09 14:31, Donal Fellows wrote: ... > The main use case for a failure to convert to/from an encoding is when > the application has a non-standard way of handling this: in the case of > the code that opened the can of worms in the first place, it was the > (awful!) convention inside ZIP files where strings can be either UTF-8 > or ISO 8859-1 (or whatever random vendor encoding was in use on the > generating system!) without metadata indicating what was actually used. > In that case, the default strategy of 8, replacing impossible > characters with ? or U+00FFFD or whatever (as specified by the > encoding), is unsuitable. To me, that smacks of being the sort of thing > you apply when using the encoding command explicitly, not in I/O > channels or working with filenames. But someone opting in for a channel > is still fine. > If this is awful, then so are posix filesystems, which operate precisely the same way: Create some files on a posix filesystem on a portable drive, move the drive to another computer, and the same issues can arise. This issue isn't limited to zip files or posix filesystems either: Structured documents in formats such as csv and xml get created in a variety of ways, and even where format specifications require a certain encoding, it's entirely possible for the need to decode from another encoding to arise. The diversity of existing encodings, even among structured data formats, is why I think "-strict" must be the default in Tcl 9. This is a change that can be dealt with almost mechanically when migrating a large code base from Tcl 8 to Tcl 9, so the burden is not as great as some other argument make make it seem to be. Furthermore, encoding/decoding behaviour in Tcl 8 is so broken that any attempt to remain backwards-compatible with it is crippling to Tcl 9, just as the broken behaviour in Tcl 8 has been crippling to the cause of Tcl. Encoding/decoding issues are a dealbreaker to any project assessing a programming language for potential use. -- Yorick |
From: Jan N. <jan...@gm...> - 2023-01-10 09:44:57
|
Op di 10 jan. 2023 om 10:36 schreef Poor Yorick: > The diversity of existing > encodings, even among structured data formats, is why I think "-strict" > must be the default in Tcl 9 .... > Furthermore, encoding/decoding behaviour in Tcl 8 is so broken that any > attempt > to remain backwards-compatible with it is crippling to Tcl 9 > It will be clear from my previous comments that I totally disagree with this remark. I won't repeat the arguments, but Donal worded it quite well (Thanks, Donal!) Regards, Jan Nijtmans |
From: Rolf A. <tcl...@po...> - 2023-01-12 00:30:36
|
Jan Nijtmans writes: > Op di 10 jan. 2023 om 10:36 schreef Poor Yorick: > >> The diversity of existing >> encodings, even among structured data formats, is why I think "-strict" >> must be the default in Tcl 9 > > .... >> Furthermore, encoding/decoding behaviour in Tcl 8 is so broken that any >> attempt >> to remain backwards-compatible with it is crippling to Tcl 9 >> > > It will be clear from my previous comments that I totally disagree with > this remark. I won't repeat the arguments, but Donal worded it > quite well (Thanks, Donal!) The Tcl 8 ship has sailed long ago. Tcl 9 is preparing to cast off. Last change to argue about its I/O defaults. It seems reasonable to me to have an backwards-compatible encoding/decoding behaviour. I hope it will not be the default (and yes, I also have a bigger code base to migrate) but that is not decided by me. But beside the question of what the defaults are there are enough open questions with respect to Tcl 9 I/O and Unicode. There are even TIPs (don't agree with 652, to say). Perhaps better those are discussed? rolf |