|
From: Artem Y. <ne...@gm...> - 2008-07-09 00:59:12
|
Yuri Takhteyev wrote: > We could re-think our choice of placeholders if we know that this is > the reason. But it sounds like elementTree is the way to go. > Yes, I agree. > A few minor things. The current version in git fails on non-ASCII > files (e.g., tests/misc/russian.txt). That's because we end up > encoding the content too early: line 1889 writes etree to xml, utf8 > encoded, after which we try to run textPostProcessors on it. That's > not good. This seems to fix it: > > xml = codecs.decode(etree.tostring(root, encoding="utf8"), "utf8") > Strange, I didn't notice it on my version. Thanks for the fix. > (I am assuming that standard etree doesn't have an option of > serializing to non-encoded unicode. If it does, use that instead.) > > Note that in my experience there is only one way to use Unicode right > with Python: assume that all strings are unicode. So, for this > reason, I've been following the policy of decoding data when it comes > my world and encoding it only when it comes out, without _ever_ > passing encoded strings around. Encoded strings are evil. > Thanks for advice. > Another thing: lots of tests seem to fail now because of whitespace > differences. I am guessing that the way to solve it is to first > extend test-markdown.py to add an option of reflowing XHTML before > diffing. Then, once we know that all tests pass except for white > space differences, we can change the expected output. > Maybe we should straight away worry about whitespace, because anyway we'll need to fix failed tests. |