Hello, first thanks for all the great work from a long term user (≈1997 on) !
A suggestion :Would it be possible to have a "Default encoding : Check file on open"?
This "check file on open" would, upon opening a file, have Alpha check the file's (e.g. BSD "file" command) and use that. And only then open the "Encodings popup" if this fails.
At present I have many files in ISO-8859 and the pop-up is working rather hard. And I'm not inclined to convert them all to UTF-8 in order to avoid further potential problems : if Alpha can handle ISO-8859, why not silently continue to do so where appropriate is my thinking.
Cheers, James
p.s. not files as "bug" nor "task" but as "RFE" which I hope means "suggestion".
p.s. to be extra clear : I don't want to change the file open pref. to default to "ISO-8859" either, since this would introduce popups for UTF-8 files I open instead.
Hi James,
thank you for the suggestion. The problem with the file command is that it is not sufficiently accurate and reliable. For instance, I tested it (with option
-Ito get a mime type) on a file encoded in macRoman and got:text/plain; charset=unknown-8bitI tested it on a file containing greek text written ISO-8859-7 and got :
text/plain; charset=iso-8859-1(sic, 8859-1, not 8859-7).Same wrong result with a russian text in ISO8859-5.
I'm afraid this would lead to wrong decisions that would be quite frustrating for the user.
Hi,
I would like to suggest a more pragmatic solution,
namely a second option prefs setting.
For example, I would set UTF-8 as my first option
and MacRoman as my second option, and Alpha would
try first UTF-8, and it if doesn't work, try
MacRoman (without presenting the dialogue to ask
me to choose another encoding). It would work fine
most of the time. Only very rarely would I have to
open a windows file (and I would be willing to
deal with that case in a more manual way, if I
could have a more automatic treatment of all my
old files in MacRoman).
James would select ISO-8859 as second option,
and it would seem to do the job for him.
People having many files of different encodings
would simply leave the second option undefined,
to avoid automatic mistakes.
If it is deemed to unsafe to open files automatically
without really being sure, another possibility
would be to use the second-option pref simply
to pre-set the default item in the pop-up of
the dialogue. That would allow the user at least to
proceed very quickly by hitting OK by return,
without having to dig into the pop-up with the mouse.
Cheers,
Joachim.
Last edit: Bernard Desgraupes 2020-11-20
Hi Joachim,
I agree, it is an elegant solution.
I'll try to implement it.
Dear all,
Perhaps that’s a solution that means less effort when opening older files, indeed. I would basically welcome to have less to do when opening files.
However, I am not fully convinced that simply defining a default encoding is always what we want for following reasons, my default encoding being UTF-8:
(i) I like nevertheless to be reminded that the file is not encoded in UTF-8, e.g. MacRoman, since I prefer to “convert” all my files sooner or later to UTF-8 encoding whenever I modify files. Perhaps a preference can be added that warns me nevertheless that the file was opened in a not UTF-8 when you implement this solution. If someone does not want such reminders, the preference can be set to no give such warnings.
(ii) Another option might be to have a preference for automatic conversion when implementing this solution, i.e. the file is opened without warning and silently converted to UTF-8, a choice you might have to confirm when you save it, where you could refuse to save it in UTF-8 if you have second thoughts about this auto conversion to the UTF-8 encoding.
Perhaps above arguments are not particularly valid, since I may not have well understood what is proposed. In any case I have to admit that it is not particularly clear what is meant by "Default encoding : Check file on open”. Does this mean I can define what my default encoding is (my understanding)? And do I have a preference whether the mismatch triggers the current dialog or not? The latter might be my point (i) above.
Regards,
Andreas
ETH Zurich
Prof. em. Dr. Andreas Fischlin
IPCC Vice-Chair WGII
Systems Ecology - Institute of Biogeochemistry and Pollutant Dynamics
CHN E 24
Universitaetstrasse 16
8092 Zurich
SWITZERLAND
andreas.fischlin@env.ethz.chandreas.fischlin@env.ethz.ch
www.sysecol.ethz.ch/people/andreas.fischlin.hmlhttp://www.sysecol.ethz.ch/people/andreas.fischlin.hml
+41 44 633-6090 phone
+41 44 633-1136 fax
+41 79 595-4050 mobile
Last edit: Bernard Desgraupes 2020-11-20
If I understand correctly Joachim's proposal this SecondaryEncoding would be empty by default which would correspond to the current behavior (which is what Andreas prefers) but Joachim himself would set it to MacRoman, and James would set it to ISO-8859-1 (aka Latin1). So everybody would be happy.
But if thereafter Joachim tries to open a Latin1 file, this file will be silently opened in MacRoman giving wrong characters for all the accented letters: he would have to use the Open File command and explicitely set the encoding to Latin1 in the dialog.
Dear Bernard,
I do not necessarily prefer the current behavior, since it is indeed a bit cumbersome. I would therefore also prefer a simpler, more automated approach. However, to really explain what I would prefer, I would need more clarity what the proposal actually is as I explained in my previous e-mail.
The only thing I can say at this point already is, (i) yes, I welcome some support in easier conversion to a desirable encoding such as UTF-8, (ii) but I wish to have sufficient control over the behavior, e.g. easy to dismiss warnings when a problematic conversion was made. No warning when I make a big step in transforming many files in a particular manner or a warning when I occasionally deal with some files, when I would appreciate to learn that a particular conversion is about to take place, so I can decide whether I wanna go with it or not.
This seems to me to perhaps mean:
However, a preference to suppress all warnings, even when the encoding cannot be converted without error, I consider to be questionable (e.g. silently open a Latin1 file and giving wrong characters), at least please not by default. I prefer here clearly a warning and a dialog as currently offered.
Perhaps the solution is an even other one:
Regards,
Andreas
Last edit: Bernard Desgraupes 2020-11-21
Dear Andreas,
thank you for your suggestions which go way beyond the initial request contained in this Ticket.
As far as this ticket is concerned, I'd like to clarify what happens when Alpha reads a file. The file contains bytes and Alpha must translate (not convert) these bytes to letters. To make this possible, the user must specify the encoding (which seves as a translation table). If your file contains the byte
0xE9, here is what happens depending on the encoding declared by the user:0xE9is forbidden in UTF8.So when I wrote that opening a Latin1 file in macRoman encoding yields "wrong" characters, what is wrong there is that the user misled Alpha. Alpha just does what it was told to do.
If you ask to open any file in any 1-byte encoding (like macRoman, Latin1, ISO-8859-7 for greek, KOI8 for russian, etc), there will never be an error message. OTOH, you may see an error message when the input encoding is UTF8 because some bytes are forbidden depending on their position in multi-byte sequences, so Alpha will tell you that your file can not be UTF8 because it contains invalid sequences.
The only "conversion" which takes place occurs when Alpha builds its internal buffer which contains only UTF16 two-byte sequences. The user should not be concerned by this: it is only an internal representation. When you save your modified file, Alpha performs the necessary backward conversion to write out the proper bytes in the desired output encoding.
The original request made in this ticket means that there should be a fallback mechanism when Alpha detects that the file is not valid UTF8 : it suggests to use some heuristics to guess what the encoding could be. Unfortunately there is no reliable method for detecting an encoding. James suggested to use the
filecommand line tool which tries to guess the encoding : but it can easily be fooled and give wrong answers. I would not rely on it.Joachim suggested that the user define a sort of secondary encoding (or fallback encoding) that Alpha would use silently when UTF8 fails : this could be useful but still it is the user's responsibility to specify a secondary encoding that suit her needs. If mostly all your non-UTF8 files are in macRoman, it makes sense to set this secondary encoding to macRoman. But if the file was Latin1, well, too bad... you just misled Alpha.
Hello Bernard, thanks for the detailed summary and discussions which explain the problems well !
The solution this is converging towards (the secondary encoding which is the user’s responsability) makes perfect sense, and I’ll say no more.
All the best, James
Related
Tickets:
#240This is fixed now implementing Jpachim's solution. I have defined a new preference called Secondary Input Encoding that is the default choice offered if Alpha failed to open a file with the input encoding.
Changes committed to the repository (rev. 2003).