From: James C. <jpg...@us...> - 2020-11-22 11:47:23
|
Hello Bernard, thanks for the detailed summary and discussions which explain the problems well ! The solution this is converging towards (the secondary encoding which is the user’s responsability) makes perfect sense, and I’ll say no more. All the best, James > On 21 Nov 2020, at 16:38, Bernard Desgraupes via AlphaCocoa-devel <alp...@li...> wrote: > > Dear Andreas, > thank you for your suggestions which go way beyond the initial request contained in this Ticket. > > As far as this ticket is concerned, I'd like to clarify what happens when Alpha reads a file. The file contains bytes and Alpha must translate (not convert) these bytes to letters. To make this possible, the user must specify the encoding (which seves as a translation table). If your file contains the byte 0xE9, here is what happens depending on the encoding declared by the user: > if the user told to use Latin1 encoding (ISO-8859-1), Alpha translates the byte to 'é' > if the user told to use macRoman encoding, Alpha translates the byte to 'È' > * if the user told to use UTF8 encoding form, there is an error because byte 0xE9 is forbidden in UTF8. > > So when I wrote that opening a Latin1 file in macRoman encoding yields "wrong" characters, what is wrong there is that the user misled Alpha. Alpha just does what it was told to do. > > If you ask to open any file in any 1-byte encoding (like macRoman, Latin1, ISO-8859-7 for greek, KOI8 for russian, etc), there will never be an error message. OTOH, you may see an error message when the input encoding is UTF8 because some bytes are forbidden depending on their position in multi-byte sequences, so Alpha will tell you that your file can not be UTF8 because it contains invalid sequences. > > The only "conversion" which takes place occurs when Alpha builds its internal buffer which contains only UTF16 two-byte sequences. The user should not be concerned by this: it is only an internal representation. When you save your modified file, Alpha performs the necessary backward conversion to write out the proper bytes in the desired output encoding. > > The original request made in this ticket means that there should be a fallback mechanism when Alpha detects that the file is not valid UTF8 : it suggests to use some heuristics to guess what the encoding could be. Unfortunately there is no reliable method for detecting an encoding. James suggested to use the file command line tool which tries to guess the encoding : but it can easily be fooled and give wrong answers. I would not rely on it. > > Joachim suggested that the user define a sort of secondary encoding (or fallback encoding) that Alpha would use silently when UTF8 fails : this could be useful but still it is the user's responsibility to specify a secondary encoding that suit her needs. If mostly all your non-UTF8 files are in macRoman, it makes sense to set this secondary encoding to macRoman. But if the file was Latin1, well, too bad... you just misled Alpha. > > [tickets:#240] Encoding > > Status: open > Created: Sat Oct 24, 2020 07:43 AM UTC by James Connolly > Last Updated: Fri Nov 20, 2020 05:28 PM UTC > Owner: nobody > > Hello, first thanks for all the great work from a long term user (≈1997 on) ! > > A suggestion :Would it be possible to have a "Default encoding : Check file on open"? > > This "check file on open" would, upon opening a file, have Alpha check the file's (e.g. BSD "file" command) and use that. And only then open the "Encodings popup" if this fails. > > At present I have many files in ISO-8859 and the pop-up is working rather hard. And I'm not inclined to convert them all to UTF-8 in order to avoid further potential problems : if Alpha can handle ISO-8859, why not silently continue to do so where appropriate is my thinking. > > Cheers, James > > p.s. not files as "bug" nor "task" but as "RFE" which I hope means "suggestion". > > Sent from sourceforge.net because alp...@li... is subscribed to https://sourceforge.net/p/alphacocoa/tickets/ > > To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/alphacocoa/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. > > _______________________________________________ > AlphaCocoa-devel mailing list > Alp...@li... > https://lists.sourceforge.net/lists/listinfo/alphacocoa-devel --- ** [tickets:#240] Encoding** **Status:** open **Created:** Sat Oct 24, 2020 07:43 AM UTC by James Connolly **Last Updated:** Sat Nov 21, 2020 03:38 PM UTC **Owner:** nobody Hello, first thanks for all the great work from a long term user (≈1997 on) ! A suggestion :Would it be possible to have a "Default encoding : Check file on open"? This "check file on open" would, upon opening a file, have Alpha check the file's (e.g. BSD "file" command) and use that. And only then open the "Encodings popup" if this fails. At present I have many files in ISO-8859 and the pop-up is working rather hard. And I'm not inclined to convert them all to UTF-8 in order to avoid further potential problems : if Alpha can handle ISO-8859, why not silently continue to do so where appropriate is my thinking. Cheers, James p.s. not files as "bug" nor "task" but as "RFE" which I hope means "suggestion". --- Sent from sourceforge.net because alp...@li... is subscribed to https://sourceforge.net/p/alphacocoa/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/alphacocoa/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. |