Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

You're exactly right. The best thing would be for people to gradually  
transition to UTF16 and UTF8 and slowly phase out legacy encodings.

Slava

On 31-Jan-07, at 11:26 PM, Marcelo Vanzin wrote:

> I might be repeating myself here, but the problem with using  
> encoding as
> a buffer-local property embedded in the buffer is the "chicken and  
> egg"
> problem. What encoding do you use to read the encoding string?
>
> XML parsing is not a very good example. If you look at the parser code
> in the JDK, it's really ugly. I've had to fix it at my last job and I
> still have nightmares about it. :-) Basically what it does is ready  
> the
> first few bytes, does a big "if then else" and checks if that  
> chacacter
> is the "<" character in several different encodings. Then tries to  
> parse
> using that encoding, and if it then works, use the encoding that  
> the XML
> declaration defines.
>
> This "works" for XML because the first character in an XML file  
> (except
> for whitespace) always has to be a "<". But even then it's easy to get
> things wrong; try to parse an XML file encoded in UTF-16LE using the
> 1.4.2 JDK parser and watch it blow up (1.5 works fine, BTW).
>
> Trying to apply that to a file that doesn't have to respect any
> structure is, to say the least, very, very difficult. Even if most of
> the time you can get away with just treating everything as ASCII,  
> there
> are always exceptions (the multi-byte unicode encodings being examples
> of where treating things as ASCII would fail).

Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)

jEdit is a programmer's text editor written in Java.

Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)