From: tchomby <tc...@go...> - 2009-02-27 10:08:48
|
On Thu, Feb 26, 2009 at 02:57:16PM -0500, Waylan Limberg wrote: > > Keep in mind that you just explained the various sources of your files > earlier in your message. That particular situation is unique to you. > Someone else will have a different situation. It is impossible for the > markdown library to be able to anticipate every possible situation. > Therefore, the most "robust way" is to leave the encoding/decoding to > the end user - the only person is a position to properly address that > specific situation. Yes, I think python-markdown made the right decision. Actually I wasn't complaining about python-markdown but python, although maybe there's nothing python can do about it either, but I think it would be nice if I could just use the standard open function and it would figure out what the encoding of the file was so I didn't have to. I'm not sure what you can do if you have files like mine that apparently contains text in different encodings (I think Ron is exactly right that my file contained utf8 and some latin-1 characters). Can you decode that at all? You'd have to right code to decode it one character at a time (if that's possible) using utf8 and on each character catch the UnicodeDecodeError and try to decode the character with latin-1 instead. You'd have to have a list of all possible encodings in order of preference and try each encoding on each character in turn until you've decoded the whole file. |