From: Waylan L. <wa...@gm...> - 2008-07-08 13:52:13
|
On Tue, Jul 8, 2008 at 1:44 AM, Yuri Takhteyev <qar...@gm...> wrote: > But it sounds like elementTree is the way to go. > I agree. It doesn't appear that lxml adds any real value. Add in the trouble installing it, and I doubt many would ever use it. I'd say leave it out for now. If things improve in the future, it won't be that hard to add it back in. > Another thing: lots of tests seem to fail now because of whitespace > differences. I am guessing that the way to solve it is to first > extend test-markdown.py to add an option of reflowing XHTML before > diffing. Then, once we know that all tests pass except for white > space differences, we can change the expected output. Interestingly, I had started work on this at some point, but never got very far. My intended approach was to feed the output and expected output both into a x/html parser, normalize whitespace, and then diff the output of each. Thing is, I couldn't find a python tool that actually did that. Well, there always is BeautifulSoup, but that could very easily alter some of the html and hide bugs - defeating the purpose of testing. Considering that whitespace is insignificant in x/html and the number of x/html tools available in python, you'd think whitespace normalization would be a standard feature. Ah well. I thought about doing a simple whitespace normalization on the string using string.replace or re.sub. But then we'd lose all linebreaks so that the entire doc is on one line. That's kind of hard to diff. Looping through a dom and normalizing on each string was more than I wanted to do. I then found lxml's htmldiff tool [1], which provided an easy (better??) way to compare html docs, but it still hung up on some (not all) whitespace. Additionally, it didn't exactly provide an easily readable output to display in the test output. If your interested, I can forward the code I have - that is, if I can find it. What I'd consider doing is actually taking the most recent markdown with NanoDom and altering NanoDom's whitespace to match ET and run a little script that loops through all the tests and outputs new expected html files. It shouldn't be all that hard. [1]: http://codespeak.net/lxml/lxmlhtml.html#html-diff -- ---- Waylan Limberg wa...@gm... |