From: Yuri T. <qar...@gm...> - 2007-10-08 04:03:49
|
Aaron and Anand found a serious bug with one of the regular expressions that causes python markdown to hang on certain. This email is to announce a tentative solution and to check if anyone has better ideas. The attached script.py is the test case that Aaron and Anand sent in; script2.py is my variation on it just includes the relevant parts of markdown.py. The regular expression in question is supposed to catch patterns like [title][link_id]. The catch is that it supposed to allow for balanced parentheses in the title part, i.e., one should be able to write [one [[[[[two]]]]] three][link_id] To the best of my knowledge, python's regular expressions do not offer support balanced parentheses, so markdown.py handles this somewhat with a rather ugly regular expression (see script2.py) which catches matching brackets up two 6 levels deep. The resulting expressing is quite big, so it is perhaps not too surprising that this causes problems on long paragraph. I found a fix for this particular problem: replace + (NOBRACKET + r'(\['+NOBRACKET)*6 + (NOBRACKET+ r'\])*'+NOBRACKET)*6 with: + (NOBRACKET + r'(\[')*6 + (NOBRACKET+ r'\])*')*6 in the defintion of "BRK". Which as far as my test suite shows does not causes any problems (and really shouldn't, since the second NOBRACKET was redundant in both cases). However, I thought I would use this as an opportunity to check if anyone has any suggestions for a better way to handle this. Does this warrant a new release? - yuri -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |