I've looked into this bug and have worked up a potential fix that passes
the regression test suite. However, I think it requires some more strenuous
testing.
First, an explanation(see [here][0] for the bug description):
The problem boils down to the end-tag search in the
HtmlBlockPreprocessor class. Specifically, in the `_get_right_tag`
method, a `block.rfind` is used to find the last match in the string for an
end-tag. This works well for nested tags such as:
<div>
<div>
<div>
foo
</div>
</div>
</div>
This case is covered pretty exhaustively in the test suite.
The edge case that trips things up is when, within a single block of
processing text, there are consecutive HTML tags. In particular, when
the second tag is buried within another block tag. So, considering the
example text from the bug report:
<p>foo</p>
<ul>
<li>
<p>bar</p>
</li>
</ul>
The search returns the end p-tag that's in the middle of the list. So
the next set of processing starts with the end tags for the list.
Adding a blank line between the paragraph and the list fixes the problem
because it forces the paragraph to be processed separately from the
list. Thus, the end-tag for the list paragraph does not get confused
with the end-tag for the first paragraph.
The fix I've got replaces the `block.rfind` with a custom search
function. This approach seemed to be the one that was the least
invasive and it currently passes the regression test suite. However, I
wanted to exercise it more thoroughly and was thinking some
"in-the-wild" examples would be a good way to gain more confidence that
this fix won't break current installs. So, any offers will be
appreciated.
I'll build up some other tests on my own in the mean time. Other
suggestion welcome as well.
[0]: http://www.freewisdom.org/projects/python-markdown/Tickets/000062
Regards-
Gerry LaMontagne
|