Re: [Python-markdown-discuss] GSoC ElementTree support

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Jul 8, 2008 at 1:44 AM, Yuri Takhteyev <qar...@gm...> wrote:
>  But it sounds like elementTree is the way to go.
>

I agree. It doesn't appear that lxml adds any real value. Add in the
trouble installing it, and I doubt many would ever use it. I'd say
leave it out for now. If things improve in the future, it won't be
that hard to add it back in.

> Another thing: lots of tests seem to fail now because of whitespace
> differences.  I am guessing that the way to solve it is to first
> extend test-markdown.py to add an option of reflowing XHTML before
> diffing.  Then, once we know that all tests pass except for white
> space differences, we can change the expected output.

Interestingly, I had started work on this at some point, but never got
very far. My intended approach was to feed the output and expected
output both into a x/html parser, normalize whitespace, and then diff
the output of each. Thing is, I couldn't find a python tool that
actually did that. Well, there always is BeautifulSoup, but that could
very easily alter some of the html and hide bugs - defeating the
purpose of testing. Considering that whitespace is insignificant in
x/html and the number of x/html tools available in python, you'd think
whitespace normalization would be a standard feature. Ah well.

I thought about doing a simple whitespace normalization on the string
using string.replace or re.sub. But then we'd lose all linebreaks so
that the entire doc is on one line. That's kind of hard to diff.
Looping through a dom and normalizing on each string was more than I
wanted to do.

I then found lxml's htmldiff tool [1], which provided an easy
(better??) way to compare html docs, but it still hung up on some (not
all) whitespace. Additionally, it didn't exactly provide an easily
readable output to display in the test output. If your interested, I
can forward the code I have - that is, if I can find it.

What I'd consider doing is actually taking the most recent markdown
with NanoDom and altering NanoDom's whitespace to match ET and run a
little script that loops through all the tests and outputs new
expected html files. It shouldn't be all that hard.

[1]: http://codespeak.net/lxml/lxmlhtml.html#html-diff
-- 
----
Waylan Limberg
wa...@gm...