|
From: Petr P. <Pri...@sk...> - 2007-02-01 08:11:15
|
Matthieu Casanova asked: > > In fact why not reading that to choose the encoding > > like it is done for the xml encoding detection ? Marcelo Vanzin objected: > I might be repeating myself here, but the problem with > using encoding as a buffer-local property embedded in the > buffer is the "chicken and egg" problem. What encoding do > you use to read the encoding string? Slava Pestov added: > You're exactly right. The best thing would be for people to > gradually transition to UTF16 and UTF8 and slowly phase out > legacy encodings. I think I understand Matthieu. He asks as a thiking user. I dare to feel the same. For the Unicode encodings and the "chicken and egg" problem. Detection of the Unicode encoding format is ALSO detection of the encoding. If I perfectly know what form of Unicode encoding is used, I can ignore other attempts to explicitly prescribe the encoding by some special sequence or the like. As a user, I would not feel the need for anything like explicit prescription of the used encoding in such case. Even if the encoding were explicitly defined in the file--think about the older file was modernized by converting to UTF-16, for example--I would know that the sequence can be read correctly (here UTF-16) and checked whether it fits with the really used encoding (and possibly warn the user). But not all files use some Unicode encoding. I dare to say that majority of text files still do not use Unicode. And even UTF-8 may not be detected if it does not contain the initial mark bytes. In such case, I still need to decide what encoding to use. The simplest way is to use the jEdit's default encoding -- which may be wrong in the case of the file.=20 The great value of jEdit is not in the fact that it will work perfectly in future, but in the fact that it works nicely now. The future is, well the future. Until then, we can improve the presence or "near future".=20 To autodetect the encoding, I should first try to detect the encoding from the file. If UTF-8 without the leading bytes is used, I could still read the :encoding=3Dutf-8: as ASCII characters. Only when I am not able to autodetect the encoding, I should use the default encoding. To be perfect, after deciding the encoding and swithing the buffer to that encoding, jEdit could even check if there is no conflict with what is explicitly said inside the file. It is clear that the described functionality is not extremely simple and it possibly should not be the part of the core. However, the core could be modified and the core-related plugin for encoding autodetection could be added (like in the LatestVersion Check or QuickNotepad case). Only when the plugin was not present or activated, the default encoding should be set. The plugin should have some generic part and the mechanism that allows extension (similar to syntax highlighing modes, for example). What is your opinion? pepr |