Re: [Python-markdown-discuss] UnicodeDecodeError

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Feb 26, 2009 at 02:57:16PM -0500, Waylan Limberg wrote:
> 
> Keep in mind that you just explained the various sources of your files
> earlier in your message. That particular situation is unique to you.
> Someone else will have a different situation. It is impossible for the
> markdown library to be able to anticipate every possible situation.
> Therefore, the most "robust way" is to leave the encoding/decoding to
> the end user - the only person is a position to properly address that
> specific situation.

Yes, I think python-markdown made the right decision. Actually I wasn't 
complaining about python-markdown but python, although maybe there's nothing 
python can do about it either, but I think it would be nice if I could just use 
the standard open function and it would figure out what the encoding of the 
file was so I didn't have to.

I'm not sure what you can do if you have files like mine that apparently 
contains text in different encodings (I think Ron is exactly right that my file 
contained utf8 and some latin-1 characters). Can you decode that at all? You'd 
have to right code to decode it one character at a time (if that's possible) 
using utf8 and on each character catch the UnicodeDecodeError and try to decode 
the character with latin-1 instead. You'd have to have a list of all possible 
encodings in order of preference and try each encoding on each character in 
turn until you've decoded the whole file.