From: Marco P. <mar...@gm...> - 2009-02-27 10:14:53
|
> Yes, I think python-markdown made the right decision. Actually I wasn't > complaining about python-markdown but python, although maybe there's nothing > python can do about it either, but I think it would be nice if I could just use > the standard open function and it would figure out what the encoding of the > file was so I didn't have to. It's not entirely possible with perfect determinism. And even if it was, it wouldn't solve the problem when a file has mixed encodings. > > I'm not sure what you can do if you have files like mine that apparently > contains text in different encodings (I think Ron is exactly right that my file > contained utf8 and some latin-1 characters). Can you decode that at all? You'd > have to right code to decode it one character at a time (if that's possible) > using utf8 and on each character catch the UnicodeDecodeError and try to decode > the character with latin-1 instead. You'd have to have a list of all possible > encodings in order of preference and try each encoding on each character in > turn until you've decoded the whole file. It's not possible since in general the "character" doesn't correspond to the file atomic entity (the byte). And in utf-8 characters don't have a fixed length in term of bytes. Moreover to determine the encoding of a piece of text you need to look at many bytes. The only thing you can do is fix the encoding for the file (or try to guess it in some way), and ignore or replace substrings which don't map to the encoding. Ciao, Marco > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Marco Pantaleoni |