Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Matthieu Casanova wrote:
> In fact why not reading that to choose the encoding like it is done for
> the xml encoding detection ?

I might be repeating myself here, but the problem with using encoding as
a buffer-local property embedded in the buffer is the "chicken and egg"
problem. What encoding do you use to read the encoding string?

XML parsing is not a very good example. If you look at the parser code
in the JDK, it's really ugly. I've had to fix it at my last job and I
still have nightmares about it. :-) Basically what it does is ready the
first few bytes, does a big "if then else" and checks if that chacacter
is the "<" character in several different encodings. Then tries to parse
using that encoding, and if it then works, use the encoding that the XML
declaration defines.

This "works" for XML because the first character in an XML file (except
for whitespace) always has to be a "<". But even then it's easy to get
things wrong; try to parse an XML file encoded in UTF-16LE using the
1.4.2 JDK parser and watch it blow up (1.5 works fine, BTW).

Trying to apply that to a file that doesn't have to respect any
structure is, to say the least, very, very difficult. Even if most of
the time you can get away with just treating everything as ASCII, there
are always exceptions (the multi-byte unicode encodings being examples
of where treating things as ASCII would fail).

-- 
Marcelo Vanzin
va...@us...
"Life is too short to drink cheap beer"

Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)

jEdit is a programmer's text editor written in Java.

Re: [ jEdit-users ] Bug? Can you confirm it? (was Encoding autodetection for ... Python?)