Re: [Python-markdown-discuss] GSoC ElementTree support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Yuri Takhteyev wrote:
> We could re-think our choice of placeholders if we know that this is
> the reason.  But it sounds like elementTree is the way to go.
>   

Yes, I agree.

> A few minor things.  The current version in git fails on non-ASCII
> files (e.g., tests/misc/russian.txt).  That's because we end up
> encoding the content too early: line 1889 writes etree to xml, utf8
> encoded, after which we try to run textPostProcessors on it.  That's
> not good.  This seems to fix it:
>
>         xml = codecs.decode(etree.tostring(root, encoding="utf8"), "utf8")
>   

Strange, I didn't notice it on my version. Thanks for the fix.

> (I am assuming that standard etree doesn't have an option of
> serializing to non-encoded unicode.  If it does, use that instead.)
>
> Note that in my experience there is only one way to use Unicode right
> with Python: assume that all strings are unicode.  So, for this
> reason, I've been following the policy of decoding data when it comes
> my world and encoding it only when it comes out, without _ever_
> passing encoded strings around.  Encoded strings are evil.
>   

Thanks for advice.

> Another thing: lots of tests seem to fail now because of whitespace
> differences.  I am guessing that the way to solve it is to first
> extend test-markdown.py to add an option of reflowing XHTML before
> diffing.  Then, once we know that all tests pass except for white
> space differences, we can change the expected output.
>   
Maybe we should straight away worry about whitespace, because anyway 
we'll need to fix failed tests.