Aaron and Anand found a serious bug with one of the regular
expressions that causes python markdown to hang on certain. This
email is to announce a tentative solution and to check if anyone has
better ideas.
The attached script.py is the test case that Aaron and Anand sent in;
script2.py is my variation on it just includes the relevant parts of
markdown.py. The regular expression in question is supposed to catch
patterns like [title][link_id]. The catch is that it supposed to
allow for balanced parentheses in the title part, i.e., one should be
able to write [one [[[[[two]]]]] three][link_id]
To the best of my knowledge, python's regular expressions do not offer
support balanced parentheses, so markdown.py handles this somewhat
with a rather ugly regular expression (see script2.py) which catches
matching brackets up two 6 levels deep. The resulting expressing is
quite big, so it is perhaps not too surprising that this causes
problems on long paragraph. I found a fix for this particular
problem:
replace
+ (NOBRACKET + r'(\['+NOBRACKET)*6
+ (NOBRACKET+ r'\])*'+NOBRACKET)*6
with:
+ (NOBRACKET + r'(\[')*6
+ (NOBRACKET+ r'\])*')*6
in the defintion of "BRK". Which as far as my test suite shows does
not causes any problems (and really shouldn't, since the second
NOBRACKET was redundant in both cases).
However, I thought I would use this as an opportunity to check if
anyone has any suggestions for a better way to handle this.
Does this warrant a new release?
- yuri
--
Yuri Takhteyev
Ph.D. Candidate, UC Berkeley School of Information
http://takhteyev.org/, http://www.freewisdom.org/
|