From: <dal...@is...> - 2007-02-01 13:44:32
|
Slava Pestov wrote: > You're exactly right. The best thing would be for people to gradually > transition to UTF16 and UTF8 and slowly phase out legacy encodings. > > Slava > > On 31-Jan-07, at 11:26 PM, Marcelo Vanzin wrote: > >> I might be repeating myself here, but the problem with using >> encoding as >> a buffer-local property embedded in the buffer is the "chicken and >> egg" It is not really down to checkens and eggs ... :encoding=windows-1250: This line does not have any "strange" characters. And it never should have one. (Problem will emerge with strange encodings like Chinese or something like that, and feather will fly ...) Now, file containing that line can be encoded in some one-byte or multy-byte encoding. First try to recognize sequence ":encoding" in any one-byte encoding and if you succeed you won. You got encoding. No chickens, no eggs. No flue. If you fail - try reading file as it is multy-byte encoded (UTF something). Go same way as described earlier. If sequence is not found after all the searches - it probably isn't in file. Yes, in worst case scenario you will parse file several times and if file is big ... it might be a performance problem (:encoding can be placed at the end of file - you will read whole file). Now, if there can be a config option "I want to use this" then if user wants he can use this feature. No flame, no war. I might be wrong about this but I would love to have this feature. Normally, I would try doing it by myself but my Java knowledge and experience is quite humble. Tnx, -- Dalibor Petricevic |