From: Gerry L. <gjl...@gm...> - 2010-08-30 02:33:26
|
I've looked into this bug and have worked up a potential fix that passes the regression test suite. However, I think it requires some more strenuous testing. First, an explanation(see [here][0] for the bug description): The problem boils down to the end-tag search in the HtmlBlockPreprocessor class. Specifically, in the `_get_right_tag` method, a `block.rfind` is used to find the last match in the string for an end-tag. This works well for nested tags such as: <div> <div> <div> foo </div> </div> </div> This case is covered pretty exhaustively in the test suite. The edge case that trips things up is when, within a single block of processing text, there are consecutive HTML tags. In particular, when the second tag is buried within another block tag. So, considering the example text from the bug report: <p>foo</p> <ul> <li> <p>bar</p> </li> </ul> The search returns the end p-tag that's in the middle of the list. So the next set of processing starts with the end tags for the list. Adding a blank line between the paragraph and the list fixes the problem because it forces the paragraph to be processed separately from the list. Thus, the end-tag for the list paragraph does not get confused with the end-tag for the first paragraph. The fix I've got replaces the `block.rfind` with a custom search function. This approach seemed to be the one that was the least invasive and it currently passes the regression test suite. However, I wanted to exercise it more thoroughly and was thinking some "in-the-wild" examples would be a good way to gain more confidence that this fix won't break current installs. So, any offers will be appreciated. I'll build up some other tests on my own in the mean time. Other suggestion welcome as well. [0]: http://www.freewisdom.org/projects/python-markdown/Tickets/000062 Regards- Gerry LaMontagne |