From: Yuri T. <qar...@gm...> - 2008-10-13 08:43:05
|
I made a whole bunch of changes to the code, most of them just a matter of refactoring, but some affect the functionality too. In terms of refactoring, the biggest change is splitting Markdown class into three and making most of their methods private. I think this will make it easier to understand the code: you can now study one class at a time. Two of those three classes are still a little messy, but at least the messiness is contained. So, we now have: 1. MarkdownParser - parses pre-processed Markdown source into an ElementTree. Usage: tree = MarkdownParser.parseDocument(markdown_string) The only other exposed methods are parseChunk() and detectTabbed(). I am tempted to hide them as well, but at the moment they are needed by some extensions. 2. InlineProcessor - runs inline patterns on an ElementTree Usage: InlineProcessor(patterns).applyInlinePatterns(tree) This is the only exposed method. I also folded into this the InlineStash class. 3. Markdown - puts it all together. Usage: Markdown(extensions).convert(markdown_string) Markdown(extensions).convertFile(input_file_path, output_path_or_stream, encoding) markdownFromFile() function still exists, but only has two lines now. Another change, which does affects functionality, is that I incorporated Ben's treap implementation as a way of organizing pre-processors, patterns, etc. This kills two birds: we now have a better way of organizing those things, and this should also fix the problem reported by Eric Abrahamsen last week, which required a major change anyway. This breaks many extensions, and also breaks one non-extension test. But 2.0 is about as good of a chance as we will get for breaking backwards compatibility. I updated the footnotes extension as an example of how to use the new system. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-10-13 13:33:51
|
On Mon, Oct 13, 2008 at 4:40 AM, Yuri Takhteyev <qar...@gm...> wrote: > I made a whole bunch of changes to the code, most of them just a > matter of refactoring, but some affect the functionality too. > [snip] > non-extension test. But 2.0 is about as good of a chance as we will > get for breaking backwards compatibility. > Wow! You were busy last night. And I agree, now is definitely the time to make those changes. Unless you beat me to it, I'll start working on the extensions as soon as I can. And I'll update the writing_extensions.txt docs in the repo once I'm confident in how everything works. As a sidenote, I'm intrigued by the MarkdownParser class. One could conceivably replace that class with their own which works differently internally - as long as it has the same public methods and returns an etree instance. This really opens up the possibility of overriding/changing the core stuff. Cool! -- And if we want to change the internal stuff, it should have little to no effect on the external api. -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-10-13 17:28:21
|
> As a sidenote, I'm intrigued by the MarkdownParser class. One could > conceivably replace that class with their own which works differently > internally - as long as it has the same public methods and returns an > etree instance. This really opens up the possibility of > overriding/changing the core stuff. Cool! -- And if we want to change > the internal stuff, it should have little to no effect on the external > api. This wasn't the intension, but yes this is true. Note also that you use MarkdownParser's parseChunk() method in your custom parser. That is, you can parse certain things yourself, then delegate the rest to the original parser with parseChunk(parent, lines). Again, the main motivation for splitting was to make it easy for people (even myself!) to understand what does what. Now if you want to understand how high-level parsing works, you only need to review 362 lines, not 2000+. It also creates good granularity for adding unit testing. E.g., we can now write tests around MarkdownParser to keep track of both correctness and performance. -- http://sputnik.freewisdom.org/ |
From: Ben W. <bw...@da...> - 2008-10-13 13:57:44
|
Anyway to get an advanced look at the new 2.0? I went looking on the site and only found 1.7... On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote: >I made a whole bunch of changes to the code, most of them just a >matter of refactoring, but some affect the functionality too. > >In terms of refactoring, the biggest change is splitting Markdown >class into three and making most of their methods private. I think >this will make it easier to understand the code: you can now study one >class at a time. Two of those three classes are still a little messy, >but at least the messiness is contained. > >So, we now have: > >1. MarkdownParser - parses pre-processed Markdown source into an ElementTree . > >Usage: > > tree = MarkdownParser.parseDocument(markdown_string) > >The only other exposed methods are parseChunk() and detectTabbed(). I >am tempted to hide them as well, but at the moment they are needed by >some extensions. > >2. InlineProcessor - runs inline patterns on an ElementTree > >Usage: > > InlineProcessor(patterns).applyInlinePatterns(tree) > >This is the only exposed method. I also folded into this the InlineStash cl ass. > >3. Markdown - puts it all together. > >Usage: > > Markdown(extensions).convert(markdown_string) > Markdown(extensions).convertFile(input_file_path, >output_path_or_stream, encoding) > >markdownFromFile() function still exists, but only has two lines now. > >Another change, which does affects functionality, is that I >incorporated Ben's treap implementation as a way of organizing >pre-processors, patterns, etc. This kills two birds: we now have a >better way of organizing those things, and this should also fix the >problem reported by Eric Abrahamsen last week, which required a major >change anyway. This breaks many extensions, and also breaks one >non-extension test. But 2.0 is about as good of a chance as we will >get for breaking backwards compatibility. > >I updated the footnotes extension as an example of how to use the new system . > >- yuri > >-- >http://sputnik.freewisdom.org/ > >------------------------------------------------------------------------- >This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >Build the coolest Linux based applications with Moblin SDK & win great prize s >Grand prize is a trip for two to an Open Source event anywhere in the world >http://moblin-contest.org/redirect.php?banner_id=100&url=/ >_______________________________________________ >Python-markdown-discuss mailing list >Pyt...@li... >https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss |
From: Waylan L. <wa...@gm...> - 2008-10-13 15:12:16
|
Well, we haven't release yet... but the code is all in our git repo on gitorious.org [1] [1]: http://gitorious.org/projects/python-markdown On Mon, Oct 13, 2008 at 9:40 AM, Ben Wilson <bw...@da...> wrote: > > Anyway to get an advanced look at the new 2.0? I went looking on the site > > and only found 1.7... > > > > On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote: > > > >>I made a whole bunch of changes to the code, most of them just a > >>matter of refactoring, but some affect the functionality too. > >> > >>In terms of refactoring, the biggest change is splitting Markdown > >>class into three and making most of their methods private. I think > >>this will make it easier to understand the code: you can now study one > >>class at a time. Two of those three classes are still a little messy, > >>but at least the messiness is contained. > >> > >>So, we now have: > >> > >>1. MarkdownParser - parses pre-processed Markdown source into an ElementTree > . > >> > >>Usage: > >> > >> tree = MarkdownParser.parseDocument(markdown_string) > >> > >>The only other exposed methods are parseChunk() and detectTabbed(). I > >>am tempted to hide them as well, but at the moment they are needed by > >>some extensions. > >> > >>2. InlineProcessor - runs inline patterns on an ElementTree > >> > >>Usage: > >> > >> InlineProcessor(patterns).applyInlinePatterns(tree) > >> > >>This is the only exposed method. I also folded into this the InlineStash cl > ass. > >> > >>3. Markdown - puts it all together. > >> > >>Usage: > >> > >> Markdown(extensions).convert(markdown_string) > >> Markdown(extensions).convertFile(input_file_path, > >>output_path_or_stream, encoding) > >> > >>markdownFromFile() function still exists, but only has two lines now. > >> > >>Another change, which does affects functionality, is that I > >>incorporated Ben's treap implementation as a way of organizing > >>pre-processors, patterns, etc. This kills two birds: we now have a > >>better way of organizing those things, and this should also fix the > >>problem reported by Eric Abrahamsen last week, which required a major > >>change anyway. This breaks many extensions, and also breaks one > >>non-extension test. But 2.0 is about as good of a chance as we will > >>get for breaking backwards compatibility. > >> > >>I updated the footnotes extension as an example of how to use the new system > . > >> > >>- yuri > >> > >>-- > >>http://sputnik.freewisdom.org/ > >> > >>------------------------------------------------------------------------- > >>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>Build the coolest Linux based applications with Moblin SDK & win great prize > s > >>Grand prize is a trip for two to an Open Source event anywhere in the world > >>http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>_______________________________________________ > >>Python-markdown-discuss mailing list > >>Pyt...@li... > >>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |
From: Ben W. <bw...@da...> - 2008-10-13 15:46:49
|
Thanks. Hey, I noticed the commits. Can y'all share more on the Wiki Links? I'm surprised I've not been getting these messages in a while. I'm in the process of moving domain names, and suddenly I'm picking them up. So, I think I'm a bit behind. On 10/13/2008, "Waylan Limberg" <wa...@gm...> wrote: >Well, we haven't release yet... > >but the code is all in our git repo on gitorious.org [1] > >[1]: http://gitorious.org/projects/python-markdown > >On Mon, Oct 13, 2008 at 9:40 AM, Ben Wilson <bw...@da...> wrote: >> >> Anyway to get an advanced look at the new 2.0? I went looking on the site >> >> and only found 1.7... >> >> >> >> On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote: >> >> >> >>>I made a whole bunch of changes to the code, most of them just a >> >>>matter of refactoring, but some affect the functionality too. >> >>> >> >>>In terms of refactoring, the biggest change is splitting Markdown >> >>>class into three and making most of their methods private. I think >> >>>this will make it easier to understand the code: you can now study one >> >>>class at a time. Two of those three classes are still a little messy, >> >>>but at least the messiness is contained. >> >>> >> >>>So, we now have: >> >>> >> >>>1. MarkdownParser - parses pre-processed Markdown source into an ElementTr ee >> . >> >>> >> >>>Usage: >> >>> >> >>> tree = MarkdownParser.parseDocument(markdown_string) >> >>> >> >>>The only other exposed methods are parseChunk() and detectTabbed(). I >> >>>am tempted to hide them as well, but at the moment they are needed by >> >>>some extensions. >> >>> >> >>>2. InlineProcessor - runs inline patterns on an ElementTree >> >>> >> >>>Usage: >> >>> >> >>> InlineProcessor(patterns).applyInlinePatterns(tree) >> >>> >> >>>This is the only exposed method. I also folded into this the InlineStash cl >> ass. >> >>> >> >>>3. Markdown - puts it all together. >> >>> >> >>>Usage: >> >>> >> >>> Markdown(extensions).convert(markdown_string) >> >>> Markdown(extensions).convertFile(input_file_path, >> >>>output_path_or_stream, encoding) >> >>> >> >>>markdownFromFile() function still exists, but only has two lines now. >> >>> >> >>>Another change, which does affects functionality, is that I >> >>>incorporated Ben's treap implementation as a way of organizing >> >>>pre-processors, patterns, etc. This kills two birds: we now have a >> >>>better way of organizing those things, and this should also fix the >> >>>problem reported by Eric Abrahamsen last week, which required a major >> >>>change anyway. This breaks many extensions, and also breaks one >> >>>non-extension test. But 2.0 is about as good of a chance as we will >> >>>get for breaking backwards compatibility. >> >>> >> >>>I updated the footnotes extension as an example of how to use the new syst em >> . >> >>> >> >>>- yuri >> >>> >> >>>-- >> >>>http://sputnik.freewisdom.org/ >> >>> >> >>>------------------------------------------------------------------------- >> >>>This SF.Net email is sponsored by the Moblin Your Move Developer's challen ge >> >>>Build the coolest Linux based applications with Moblin SDK & win great pri ze >> s >> >>>Grand prize is a trip for two to an Open Source event anywhere in the worl d >> >>>http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> >>>_______________________________________________ >> >>>Python-markdown-discuss mailing list >> >>>Pyt...@li... >> >>>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challen ge >> Build the coolest Linux based applications with Moblin SDK & win great pri zes >> Grand prize is a trip for two to an Open Source event anywhere in the worl d >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Python-markdown-discuss mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss >> > > > >-- >---- >Waylan Limberg >wa...@gm... > >------------------------------------------------------------------------- >This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >Build the coolest Linux based applications with Moblin SDK & win great prize s >Grand prize is a trip for two to an Open Source event anywhere in the world >http://moblin-contest.org/redirect.php?banner_id=100&url=/ >_______________________________________________ >Python-markdown-discuss mailing list >Pyt...@li... >https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss |
From: Yuri T. <qar...@gm...> - 2008-10-13 17:51:09
|
> Thanks. Hey, I noticed the commits. Can y'all share more on the Wiki Let me finish the code first. There are still a few issues. First, there are a few tests that break. Second, the Treap implementation now requires Python 2.4. I think we can get it to work with 2.3 without too much hassle, though. Finally, I want to extend Treap a little to make the initial construction of the treap a little easier. I think the third argument to add() should be optional, defaulting to "after the whatever is currently in the end"). I.e., I want to be able to just write: self.inlinePatterns.add("escape", SimpleTextPattern(ESCAPE_RE)) self.inlinePatterns.add("link", LinkPattern(LINK_RE)) self.inlinePatterns.add("image_link", ImagePattern(IMAGE_LINK_RE)) > I'm surprised I've not been getting these messages in a while. I'm in > > the process of moving domain names, and suddenly I'm picking them up. Well, glad you are still with us. To make a long story shot, Artem Yunusov did a lot of work this summer posting the code to use ElementTree and starting the separation that I finished yesterday. (Artem really did most of the hard work, I just took the methods, sorted them into two classes, and then gave them simpler names.) This was my original goal for "2.0", which also created the opportunity to also move to treap - only after 15 months of delay! If you want to suggest any modifications to your treap implementation (or other things), you can send me patches, create a "clone" on gitorious, or email me your user name and will add you as a committer so that you could create branches in our current repository. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-10-13 17:26:59
|
On Mon, Oct 13, 2008 at 9:33 AM, Waylan Limberg <wa...@gm...> wrote: > As a sidenote, I'm intrigued by the MarkdownParser class. One could > conceivably replace that class with their own which works differently > internally - as long as it has the same public methods and returns an > etree instance. This really opens up the possibility of > overriding/changing the core stuff. Cool! Well, maybe not so cool. By making all (most) of the methods truly private it makes moneypatching more difficult. I realize it's not that hard to use a subclass rather than monkeypatch, but what happens when two extensions each create their own subclass changing a different method? With monkeypatches, we just used the same instance and all was good. Now, thats not so easy. Sure, it's possible, but definitely feels more hacky. For example, the CodeHilite extension has to do this: md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock instead of this: md._processCodeBlock = _hiliteCodeBlock Or does someone have any suggestions of how to resolve different extensions all using different subclasses of MarkdownParser without each extension being specifically aware of the others? I don't see how mixins would work here either. Or am I missing something obvious? -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-10-13 17:40:18
|
> Well, maybe not so cool. By making all (most) of the methods truly > private it makes moneypatching more difficult. I realize it's not that > hard to use a subclass rather than monkeypatch, but what happens when > two extensions each create their own subclass changing a different > method? With monkeypatches, we just used the same instance and all was > good. They don't have to stay private. I decided to start by making them private in order to expose any questionable dependencies that we may have. We can then think of whether there are a few more methods that may be worth exposing. > md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock I think there is an entirely different (and better) way to do this now. Use the standard MarkdownParser, then write a postprocessor to modify the eTree. At the moment, it appears that we don't offer an option of modifying the tree before the patterns are run, but we should. I.e., our pipeline should be: 1. text pre-processors (text-in, text-out) - tempated to drop this 2. line pre-processors (line list in, line list out) 3. MarkdownParser.parseDocument() - substitute your own if you want 4. pre-pattern post-processors (modify the tree before any patterns are run) 5. InlineProcessor.applyInlinePatterns() 6. etree postprocessors (modify eTree) 7. serialization of the etree into a string 8. text postprocessors (text-in, text-out) My generic recommendation now would be that extension writers first look into whether they can do what they want to do by adding post-processors at steps 4, 6 or 8, or by adding patterns. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-10-13 23:22:25
|
On Mon, Oct 13, 2008 at 1:40 PM, Yuri Takhteyev <qar...@gm...> wrote: [snip] > >> md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock > > I think there is an entirely different (and better) way to do this > now. Use the standard MarkdownParser, then write a postprocessor to > modify the eTree. Don't know why I didn't think of this before. eTree makes is easy. I just pushed a refactored CodeHilite extension. Much cleaner. > At the moment, it appears that we don't offer an > option of modifying the tree before the patterns are run, but we > should. I.e., our pipeline should be: > > 1. text pre-processors (text-in, text-out) - tempated to drop this > 2. line pre-processors (line list in, line list out) > 3. MarkdownParser.parseDocument() - substitute your own if you want > 4. pre-pattern post-processors (modify the tree before any patterns are run) > 5. InlineProcessor.applyInlinePatterns() > 6. etree postprocessors (modify eTree) > 7. serialization of the etree into a string > 8. text postprocessors (text-in, text-out) Why not just make the InlineProcessor be one of the 'postprocessors' and then extensions can add additional postprocessors either before or after it as needed? -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-10-13 23:35:29
|
> Why not just make the InlineProcessor be one of the 'postprocessors' > and then extensions can add additional postprocessors either before or > after it as needed? Good point. If we then also get rid of the preprocessor/textpreprocessor distinction, we can just reduce it all to three: Preprocessor treap: HtmlBlock, Header, Line, Reference Treeprocessors treap: Inline Postprocessors treap: Prettify, RawHtml, AndSubstitute Extensions can then insert processors into one of those three treaps and also insert patterns into the inline processor. (Or they can replace InlineProcessor with their own.) - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-10-14 00:34:07
|
On Mon, Oct 13, 2008 at 7:35 PM, Yuri Takhteyev <qar...@gm...> wrote: >> Why not just make the InlineProcessor be one of the 'postprocessors' >> and then extensions can add additional postprocessors either before or >> after it as needed? > > Good point. If we then also get rid of the > preprocessor/textpreprocessor distinction, we can just reduce it all > to three: > > Preprocessor treap: HtmlBlock, Header, Line, Reference > Treeprocessors treap: Inline > Postprocessors treap: Prettify, RawHtml, AndSubstitute Except that Prettify is a Treeprocessor. In any event, I like this naming much better (pre, tree, post). It's much clearer whats going on. > > Extensions can then insert processors into one of those three treaps > and also insert patterns into the inline processor. (Or they can > replace InlineProcessor with their own.) ...or they can replace/subclass the MarkdownParser. With this api, someone could use the Markdown engine and rewrite a completely different markup language. Not that one should, but the fact that one can is a testament to the api IMO. -- --- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-10-20 14:26:37
|
On Mon, Oct 13, 2008 at 7:35 PM, Yuri Takhteyev <qar...@gm...> wrote: >> Why not just make the InlineProcessor be one of the 'postprocessors' >> and then extensions can add additional postprocessors either before or >> after it as needed? > > Good point. If we then also get rid of the > preprocessor/textpreprocessor distinction, we can just reduce it all > to three: > FYI, I just pushed the last of these changes. We now only have three types of processors: Preprocessor treap: HtmlBlock, Header, Line, Reference Treeprocessors treap: Inline, Prettify Postprocessors treap: RawHtml, AndSubstitute If anyone has the old TextPostprocessors, or either of the old postprocessors in your extensions, you'll need to make a few minor updates for things to work. InlinePatterns should be unaffected - it's just that now you can manipulate the tree before they are run if you desire. I should also mention that all this stuff is fully documented in docs/writing_extensions.txt. Any improvements, corrections, suggestions are welcome. -- ---- Waylan Limberg wa...@gm... |