From: Artem Y. <ne...@gm...> - 2008-07-15 15:21:03
|
I reformatted test suite, fixed a lot of bugs in version with ElementTree, and now all the test are working. I changed hrs handling because in NanoDOM version top level hrs surrounded with p tags, and p tags was stripped out in toxml method. Now LinePreprocessor replaces all hrs declarations with "___", then I added Markdown._processHR method, and in Markdown._processSection we now also checking for hr. But I also was forced to add this check to Markdown._processParagraph. Maybe the simplest and faster way of fixing it is just plain replace all "<p><hr /></p>" with "<hr />" after serialization, but then, we won't get valid ElementTree. Concerning attributes({@id=1234}), that was handled by NanoDOM, I added global function handleAttributes(text, parent), because it's required in inline patterns(ImagePattern) and also in Markdown class. Now we processing attributes in Markdown._processTree, after applying inline patterns, but still in same cycle. New version is slower then previous because of these changes, but still faster then new version with NanoDOM. I also fixed ticket #5 [1] in GSoC etree branch. Changing order of inline patterns works, but then other tests will fail. I changed BACKTICK_RE, to r'[^\\]\`([^\`]*[^\\]{0,1})\`' , after that evrything works fine, except of striping last character before backtick, for instance, "test `test`" -> "test</code>test</code>" instead of "test </code>test</code>", it's because of negative expression([^\\]) at the begining of regexp, so I decided to add to Pattern class attribute contentGroup, representing number of group, that we'd like to replace, by default it equals to 2. And changed regexp to r'([^\\])\`([^\`]*[^\\]{0,1})\`', so now we should use group 3 instead of group 2, and we creating pattern in that way: BacktickPattern(BACKTICK_RE, 3), joined group 1 and group 2 will be the string to the left of the match. [1]: http://www.freewisdom.org/projects/python-markdown/Tickets/000005 |
From: Artem Y. <ne...@gm...> - 2008-07-19 13:28:19
|
I deleted Pattern.contentGroup attribute, that I previously added, because I solved this problem using regexps. I also created new regexp for links, it partly solves ticket#4 because it works fine, until you'll try to insert nested parenthesis in link, in Perl implementation it's solved with recursive regexp, but I don't see any way of doing it in Python using regexps. Perl code: $g_nested_parens = qr{ (?> # Atomic matching [^()\s]+ # Anything other than parens or whitespace | \( (??{ $g_nested_parens }) # Recursive set of nested brackets \) )* }x; Now link regexp works for angled links too, so I deleted LINK_ANGLED_PATTERN from patterns list. Now link regexp is quite complicated, so what about of using re.VERBOSE flag? I tried it without any changes to regexps, but it's not working, seems that Python goes to infinite loop after adding this flag. Also I created aggregated regexp for STRONG_RE and STRONG_2_RE and aggregated regexp for STRONG_EM_RE and STRONG_EM_2_RE. So STRONG_2_RE and STRONG_EM_2_RE can be deleted form patterns list. Now I'm gathering different issues/bugs, I think I'll post it on Monday for discussion, which of them we want to fix. Another thing I plan to do - port extensions to ElementTree. Maybe some refactoring. For instance class CorePatterns scheduled for refactoring, but now I don't have an idea what can be a better replacement for it. |
From: Waylan L. <wa...@gm...> - 2008-07-19 17:47:53
|
On Sat, Jul 19, 2008 at 9:28 AM, Artem Yunusov <ne...@gm...> wrote: > I deleted Pattern.contentGroup attribute, that I previously added, > because I solved this problem using regexps. I also created new regexp > for links, it partly solves ticket#4 because it works fine, until you'll > try to insert nested parenthesis in link, in Perl implementation it's > solved with recursive regexp, but I don't see any way of doing it in > Python using regexps. > Yeah I was just looking at that yesterday. A small annoyance. [snip] > Maybe some refactoring. For instance class CorePatterns scheduled for > refactoring, but now I don't have an idea what can be a better > replacement for it. > I'd leave this alone for now. IMO, what needs to happen is the current code should be (mostly) thrown out (sorry Yuri) and we need to create some easily extendable API where each block-level parser is its own class or something (perhaps a state machine or similar). And in the process, we need to lose the recursion that we have now. I suspect this would be a major undertaking and an entire summer project in itself. The recursion issue could probably be dealt with on it's own, but while we're in there, we might as well make is extendable. Currently, if you wanted to write an extension that added another block-level type (say definition lists) you would have to both create a processor for them and then override the _processSection method (through a monkey patch) with your own that reimplements the entire existing method with the appropriate calls to your new definition list processor added into the logic. It's ugly and complicated and doesn't work if multiple extensions take the same approach. However, that does bring to mind something you could do. Perhaps try creating a definition list extension as a preproccessor. Not sure how you'd get it into the DOM, but I'm sure there are a few possibilities. Just be sure to follow the syntax used by PHP Markdown Extra. That seems to be the community accepted syntax. Oh, and while I'm at it, I should mention that I committed a small patch [1] for an undocumented bug that David had reintroduced while refactoring the extension api. Perhaps you're keeping an eye on the master repo and already know about it, but you may want to merge that into your code. [1]: http://gitorious.org/projects/python-markdown/repos/mainline/commits/bf7cf776daa26d734c10a6039efe64113f066045 -- ---- Waylan Limberg wa...@gm... |
From: Artem Y. <ne...@gm...> - 2008-07-20 22:52:26
|
Waylan Limberg wrote: > On Sat, Jul 19, 2008 at 9:28 AM, Artem Yunusov <ne...@gm...> wrote: > >> I deleted Pattern.contentGroup attribute, that I previously added, >> because I solved this problem using regexps. I also created new regexp >> for links, it partly solves ticket#4 because it works fine, until you'll >> try to insert nested parenthesis in link, in Perl implementation it's >> solved with recursive regexp, but I don't see any way of doing it in >> Python using regexps. >> >> > > Yeah I was just looking at that yesterday. A small annoyance. > > [snip] > Maybe we should use some combination of parsing and regexps here. But I think it's not very important bug now. > >> Maybe some refactoring. For instance class CorePatterns scheduled for >> refactoring, but now I don't have an idea what can be a better >> replacement for it. >> >> > > I'd leave this alone for now. IMO, what needs to happen is the current > code should be (mostly) thrown out (sorry Yuri) and we need to create > some easily extendable API where each block-level parser is its own > class or something (perhaps a state machine or similar). And in the > process, we need to lose the recursion that we have now. I suspect > this would be a major undertaking and an entire summer project in > itself. > > The recursion issue could probably be dealt with on it's own, but > while we're in there, we might as well make is extendable. Currently, > if you wanted to write an extension that added another block-level > type (say definition lists) you would have to both create a processor > for them and then override the _processSection method (through a > monkey patch) with your own that reimplements the entire existing > method with the appropriate calls to your new definition list > processor added into the logic. It's ugly and complicated and doesn't > work if multiple extensions take the same approach. > Hmm, so then there will be new type of extensions, something called block level patterns. I think it'll slow down markdown, but sure it'll be quite useful. Yuri, what do you think about it? > However, that does bring to mind something you could do. Perhaps try > creating a definition list extension as a preproccessor. Not sure how > you'd get it into the DOM, but I'm sure there are a few possibilities. > Just be sure to follow the syntax used by PHP Markdown Extra. That > seems to be the community accepted syntax. > > Oh, and while I'm at it, I should mention that I committed a small > patch [1] for an undocumented bug that David had reintroduced while > refactoring the extension api. Perhaps you're keeping an eye on the > master repo and already know about it, but you may want to merge that > into your code. > > [1]: http://gitorious.org/projects/python-markdown/repos/mainline/commits/bf7cf776daa26d734c10a6039efe64113f066045 > Yep, I saw it, I think I'll integrate it in GSoC version. |