Thread: [Python-markdown-discuss] refactoring and treap integration

Brought to you by: qaramazov, waylanhl

python-markdown-discuss

[Python-markdown-discuss] refactoring and treap integration

From: Yuri T. <qar...@gm...> - 2008-10-13 08:43:05

I made a whole bunch of changes to the code, most of them just a
matter of refactoring, but some affect the functionality too.

In terms of refactoring, the biggest change is splitting Markdown
class into three and making most of their methods private.  I think
this will make it easier to understand the code: you can now study one
class at a time.  Two of those three classes are still a little messy,
but at least the messiness is contained.

So, we now have:

1. MarkdownParser - parses pre-processed Markdown source into an ElementTree.

Usage:

    tree = MarkdownParser.parseDocument(markdown_string)

The only other exposed methods are parseChunk() and detectTabbed().  I
am tempted to hide them as well, but at the moment they are needed by
some extensions.

2. InlineProcessor - runs inline patterns on an ElementTree

Usage:

    InlineProcessor(patterns).applyInlinePatterns(tree)

This is the only exposed method.  I also folded into this the InlineStash class.

3. Markdown - puts it all together.

Usage:

    Markdown(extensions).convert(markdown_string)
    Markdown(extensions).convertFile(input_file_path,
output_path_or_stream, encoding)

markdownFromFile() function still exists, but only has two lines now.

Another change, which does affects functionality, is that I
incorporated Ben's treap implementation as a way of organizing
pre-processors, patterns, etc.  This kills two birds: we now have a
better way of organizing those things, and this should also fix the
problem reported by Eric Abrahamsen last week, which required a major
change anyway.  This breaks many extensions, and also breaks one
non-extension test.  But 2.0 is about as good of a chance as we will
get for breaking backwards compatibility.

I updated the footnotes extension as an example of how to use the new system.

- yuri

-- 
http://sputnik.freewisdom.org/

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-13 13:33:51

On Mon, Oct 13, 2008 at 4:40 AM, Yuri Takhteyev <qar...@gm...> wrote:
> I made a whole bunch of changes to the code, most of them just a
> matter of refactoring, but some affect the functionality too.
>
[snip]
> non-extension test.  But 2.0 is about as good of a chance as we will
> get for breaking backwards compatibility.
>

Wow! You were busy last night. And I agree, now is definitely the time
to make those changes.

Unless you beat me to it, I'll start working on the extensions as soon
as I can. And I'll update the writing_extensions.txt docs in the repo
once I'm confident in how everything works.

As a sidenote, I'm intrigued by the MarkdownParser class. One could
conceivably replace that class with their own which works differently
internally - as long as it has the same public methods and returns an
etree instance. This really opens up the possibility of
overriding/changing the core stuff. Cool! -- And if we want to change
the internal stuff, it should have little to no effect on the external
api.

-- 
----
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] refactoring and treap integration

From: Yuri T. <qar...@gm...> - 2008-10-13 17:28:21

> As a sidenote, I'm intrigued by the MarkdownParser class. One could
> conceivably replace that class with their own which works differently
> internally - as long as it has the same public methods and returns an
> etree instance. This really opens up the possibility of
> overriding/changing the core stuff. Cool! -- And if we want to change
> the internal stuff, it should have little to no effect on the external
> api.

This wasn't the intension, but yes this is true.  Note also that you
use MarkdownParser's parseChunk() method in your custom parser.  That
is, you can parse certain things yourself, then delegate the rest to
the original parser with parseChunk(parent, lines).

Again, the main motivation for splitting was to make it easy for
people (even myself!) to understand what does what.  Now if you want
to understand how high-level parsing works, you only need to review
362 lines, not 2000+.  It also creates good granularity for adding
unit testing.  E.g., we can now write tests around MarkdownParser to
keep track of both correctness and performance.

--
http://sputnik.freewisdom.org/

Re: [Python-markdown-discuss] refactoring and treap integration

From: Ben W. <bw...@da...> - 2008-10-13 13:57:44

Anyway to get an advanced look at the new 2.0? I went looking on the site

and only found 1.7...



On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote:



>I made a whole bunch of changes to the code, most of them just a

>matter of refactoring, but some affect the functionality too.

>

>In terms of refactoring, the biggest change is splitting Markdown

>class into three and making most of their methods private.  I think

>this will make it easier to understand the code: you can now study one

>class at a time.  Two of those three classes are still a little messy,

>but at least the messiness is contained.

>

>So, we now have:

>

>1. MarkdownParser - parses pre-processed Markdown source into an ElementTree
.

>

>Usage:

>

>    tree = MarkdownParser.parseDocument(markdown_string)

>

>The only other exposed methods are parseChunk() and detectTabbed().  I

>am tempted to hide them as well, but at the moment they are needed by

>some extensions.

>

>2. InlineProcessor - runs inline patterns on an ElementTree

>

>Usage:

>

>    InlineProcessor(patterns).applyInlinePatterns(tree)

>

>This is the only exposed method.  I also folded into this the InlineStash cl
ass.

>

>3. Markdown - puts it all together.

>

>Usage:

>

>    Markdown(extensions).convert(markdown_string)

>    Markdown(extensions).convertFile(input_file_path,

>output_path_or_stream, encoding)

>

>markdownFromFile() function still exists, but only has two lines now.

>

>Another change, which does affects functionality, is that I

>incorporated Ben's treap implementation as a way of organizing

>pre-processors, patterns, etc.  This kills two birds: we now have a

>better way of organizing those things, and this should also fix the

>problem reported by Eric Abrahamsen last week, which required a major

>change anyway.  This breaks many extensions, and also breaks one

>non-extension test.  But 2.0 is about as good of a chance as we will

>get for breaking backwards compatibility.

>

>I updated the footnotes extension as an example of how to use the new system
.

>

>- yuri

>

>--

>http://sputnik.freewisdom.org/

>

>-------------------------------------------------------------------------

>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge

>Build the coolest Linux based applications with Moblin SDK & win great prize
s

>Grand prize is a trip for two to an Open Source event anywhere in the world

>http://moblin-contest.org/redirect.php?banner_id=100&url=/

>_______________________________________________

>Python-markdown-discuss mailing list

>Pyt...@li...

>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-13 15:12:16

Well, we haven't release yet...

but the code is all in our git repo on gitorious.org [1]

[1]: http://gitorious.org/projects/python-markdown

On Mon, Oct 13, 2008 at 9:40 AM, Ben Wilson <bw...@da...> wrote:
>
> Anyway to get an advanced look at the new 2.0? I went looking on the site
>
> and only found 1.7...
>
>
>
> On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote:
>
>
>
>>I made a whole bunch of changes to the code, most of them just a
>
>>matter of refactoring, but some affect the functionality too.
>
>>
>
>>In terms of refactoring, the biggest change is splitting Markdown
>
>>class into three and making most of their methods private.  I think
>
>>this will make it easier to understand the code: you can now study one
>
>>class at a time.  Two of those three classes are still a little messy,
>
>>but at least the messiness is contained.
>
>>
>
>>So, we now have:
>
>>
>
>>1. MarkdownParser - parses pre-processed Markdown source into an ElementTree
> .
>
>>
>
>>Usage:
>
>>
>
>>    tree = MarkdownParser.parseDocument(markdown_string)
>
>>
>
>>The only other exposed methods are parseChunk() and detectTabbed().  I
>
>>am tempted to hide them as well, but at the moment they are needed by
>
>>some extensions.
>
>>
>
>>2. InlineProcessor - runs inline patterns on an ElementTree
>
>>
>
>>Usage:
>
>>
>
>>    InlineProcessor(patterns).applyInlinePatterns(tree)
>
>>
>
>>This is the only exposed method.  I also folded into this the InlineStash cl
> ass.
>
>>
>
>>3. Markdown - puts it all together.
>
>>
>
>>Usage:
>
>>
>
>>    Markdown(extensions).convert(markdown_string)
>
>>    Markdown(extensions).convertFile(input_file_path,
>
>>output_path_or_stream, encoding)
>
>>
>
>>markdownFromFile() function still exists, but only has two lines now.
>
>>
>
>>Another change, which does affects functionality, is that I
>
>>incorporated Ben's treap implementation as a way of organizing
>
>>pre-processors, patterns, etc.  This kills two birds: we now have a
>
>>better way of organizing those things, and this should also fix the
>
>>problem reported by Eric Abrahamsen last week, which required a major
>
>>change anyway.  This breaks many extensions, and also breaks one
>
>>non-extension test.  But 2.0 is about as good of a chance as we will
>
>>get for breaking backwards compatibility.
>
>>
>
>>I updated the footnotes extension as an example of how to use the new system
> .
>
>>
>
>>- yuri
>
>>
>
>>--
>
>>http://sputnik.freewisdom.org/
>
>>
>
>>-------------------------------------------------------------------------
>
>>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>
>>Build the coolest Linux based applications with Moblin SDK & win great prize
> s
>
>>Grand prize is a trip for two to an Open Source event anywhere in the world
>
>>http://moblin-contest.org/redirect.php?banner_id=100&url=/
>
>>_______________________________________________
>
>>Python-markdown-discuss mailing list
>
>>Pyt...@li...
>
>>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>



-- 
----
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] refactoring and treap integration

From: Ben W. <bw...@da...> - 2008-10-13 15:46:49

Thanks. Hey, I noticed the commits. Can y'all share more on the Wiki

Links?



I'm surprised I've not been getting these messages in a while. I'm in

the process of moving domain names, and suddenly I'm picking them up.

So, I think I'm a bit behind.



On 10/13/2008, "Waylan Limberg" <wa...@gm...> wrote:



>Well, we haven't release yet...

>

>but the code is all in our git repo on gitorious.org [1]

>

>[1]: http://gitorious.org/projects/python-markdown

>

>On Mon, Oct 13, 2008 at 9:40 AM, Ben Wilson <bw...@da...> wrote:

>>

>> Anyway to get an advanced look at the new 2.0? I went looking on the site

>>

>> and only found 1.7...

>>

>>

>>

>> On 10/13/2008, "Yuri Takhteyev" <qar...@gm...> wrote:

>>

>>

>>

>>>I made a whole bunch of changes to the code, most of them just a

>>

>>>matter of refactoring, but some affect the functionality too.

>>

>>>

>>

>>>In terms of refactoring, the biggest change is splitting Markdown

>>

>>>class into three and making most of their methods private.  I think

>>

>>>this will make it easier to understand the code: you can now study one

>>

>>>class at a time.  Two of those three classes are still a little messy,

>>

>>>but at least the messiness is contained.

>>

>>>

>>

>>>So, we now have:

>>

>>>

>>

>>>1. MarkdownParser - parses pre-processed Markdown source into an ElementTr
ee

>> .

>>

>>>

>>

>>>Usage:

>>

>>>

>>

>>>    tree = MarkdownParser.parseDocument(markdown_string)

>>

>>>

>>

>>>The only other exposed methods are parseChunk() and detectTabbed().  I

>>

>>>am tempted to hide them as well, but at the moment they are needed by

>>

>>>some extensions.

>>

>>>

>>

>>>2. InlineProcessor - runs inline patterns on an ElementTree

>>

>>>

>>

>>>Usage:

>>

>>>

>>

>>>    InlineProcessor(patterns).applyInlinePatterns(tree)

>>

>>>

>>

>>>This is the only exposed method.  I also folded into this the InlineStash 
cl

>> ass.

>>

>>>

>>

>>>3. Markdown - puts it all together.

>>

>>>

>>

>>>Usage:

>>

>>>

>>

>>>    Markdown(extensions).convert(markdown_string)

>>

>>>    Markdown(extensions).convertFile(input_file_path,

>>

>>>output_path_or_stream, encoding)

>>

>>>

>>

>>>markdownFromFile() function still exists, but only has two lines now.

>>

>>>

>>

>>>Another change, which does affects functionality, is that I

>>

>>>incorporated Ben's treap implementation as a way of organizing

>>

>>>pre-processors, patterns, etc.  This kills two birds: we now have a

>>

>>>better way of organizing those things, and this should also fix the

>>

>>>problem reported by Eric Abrahamsen last week, which required a major

>>

>>>change anyway.  This breaks many extensions, and also breaks one

>>

>>>non-extension test.  But 2.0 is about as good of a chance as we will

>>

>>>get for breaking backwards compatibility.

>>

>>>

>>

>>>I updated the footnotes extension as an example of how to use the new syst
em

>> .

>>

>>>

>>

>>>- yuri

>>

>>>

>>

>>>--

>>

>>>http://sputnik.freewisdom.org/

>>

>>>

>>

>>>-------------------------------------------------------------------------

>>

>>>This SF.Net email is sponsored by the Moblin Your Move Developer's challen
ge

>>

>>>Build the coolest Linux based applications with Moblin SDK & win great pri
ze

>> s

>>

>>>Grand prize is a trip for two to an Open Source event anywhere in the worl
d

>>

>>>http://moblin-contest.org/redirect.php?banner_id=100&url=/

>>

>>>_______________________________________________

>>

>>>Python-markdown-discuss mailing list

>>

>>>Pyt...@li...

>>

>>>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss

>>

>> -------------------------------------------------------------------------

>> This SF.Net email is sponsored by the Moblin Your Move Developer's challen
ge

>> Build the coolest Linux based applications with Moblin SDK & win great pri
zes

>> Grand prize is a trip for two to an Open Source event anywhere in the worl
d

>> http://moblin-contest.org/redirect.php?banner_id=100&url=/

>> _______________________________________________

>> Python-markdown-discuss mailing list

>> Pyt...@li...

>> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss

>>

>

>

>

>--

>----

>Waylan Limberg

>wa...@gm...

>

>-------------------------------------------------------------------------

>This SF.Net email is sponsored by the Moblin Your Move Developer's challenge

>Build the coolest Linux based applications with Moblin SDK & win great prize
s

>Grand prize is a trip for two to an Open Source event anywhere in the world

>http://moblin-contest.org/redirect.php?banner_id=100&url=/

>_______________________________________________

>Python-markdown-discuss mailing list

>Pyt...@li...

>https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss

Re: [Python-markdown-discuss] refactoring and treap integration

From: Yuri T. <qar...@gm...> - 2008-10-13 17:51:09

> Thanks. Hey, I noticed the commits. Can y'all share more on the Wiki

Let me finish the code first.  There are still a few issues.  First,
there are a few tests that break.  Second, the Treap implementation
now requires Python 2.4.  I think we can get it to work with 2.3
without too much hassle, though.  Finally, I want to extend Treap a
little to make the initial construction of the treap a little easier.
I think the third argument to add() should be optional, defaulting to
"after the whatever is currently in the end").  I.e., I want to be
able to just write:

        self.inlinePatterns.add("escape", SimpleTextPattern(ESCAPE_RE))
        self.inlinePatterns.add("link", LinkPattern(LINK_RE))
        self.inlinePatterns.add("image_link", ImagePattern(IMAGE_LINK_RE))

> I'm surprised I've not been getting these messages in a while. I'm in
>
> the process of moving domain names, and suddenly I'm picking them up.

Well, glad you are still with us.  To make a long story shot, Artem
Yunusov did a lot of work this summer posting the code to use
ElementTree and starting the separation that I finished yesterday.
(Artem really did most of the hard work, I just took the methods,
sorted them into two classes, and then gave them simpler names.)  This
was my original goal for "2.0", which also created the opportunity to
also move to treap - only after 15 months of delay!

If you want to suggest any modifications to your treap implementation
(or other things), you can send me patches, create a "clone" on
gitorious, or email me your user name and will add you as a committer
so that you could create branches in our current repository.

- yuri

-- 
http://sputnik.freewisdom.org/

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-13 17:26:59

On Mon, Oct 13, 2008 at 9:33 AM, Waylan Limberg <wa...@gm...> wrote:
> As a sidenote, I'm intrigued by the MarkdownParser class. One could
> conceivably replace that class with their own which works differently
> internally - as long as it has the same public methods and returns an
> etree instance. This really opens up the possibility of
> overriding/changing the core stuff. Cool!

Well, maybe not so cool. By making all (most) of the methods truly
private it makes moneypatching more difficult. I realize it's not that
hard to use a subclass rather than monkeypatch, but what happens when
two extensions each create their own subclass changing a different
method? With monkeypatches, we just used the same instance and all was
good.

Now, thats not so easy. Sure, it's possible, but definitely feels more
hacky. For example, the CodeHilite extension has to do this:

    md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock

instead of this:

    md._processCodeBlock = _hiliteCodeBlock

Or does someone have any suggestions of how to resolve different
extensions all using different subclasses of MarkdownParser without
each extension being specifically aware of the others? I don't see how
mixins would work here either. Or am I missing something obvious?

-- 
----
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] refactoring and treap integration

From: Yuri T. <qar...@gm...> - 2008-10-13 17:40:18

> Well, maybe not so cool. By making all (most) of the methods truly
> private it makes moneypatching more difficult. I realize it's not that
> hard to use a subclass rather than monkeypatch, but what happens when
> two extensions each create their own subclass changing a different
> method? With monkeypatches, we just used the same instance and all was
> good.

They don't have to stay private.  I decided to start by making them
private in order to expose any questionable dependencies that we may
have.  We can then think of whether there are a few more methods that
may be worth exposing.

>    md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock

I think there is an entirely different (and better) way to do this
now.  Use the standard MarkdownParser, then write a postprocessor to
modify the eTree.  At the moment, it appears that we don't offer an
option of modifying the tree before the patterns are run, but we
should.  I.e., our pipeline should be:

1. text pre-processors (text-in, text-out) - tempated to drop this
2. line pre-processors (line list in, line list out)
3. MarkdownParser.parseDocument() - substitute your own if you want
4. pre-pattern post-processors (modify the tree before any patterns are run)
5. InlineProcessor.applyInlinePatterns()
6. etree postprocessors (modify eTree)
7. serialization of the etree into a string
8. text postprocessors (text-in, text-out)

My generic recommendation now would be that extension writers first
look into whether they can do what they want to do by adding
post-processors at steps 4, 6 or 8, or by adding patterns.

- yuri

-- 
http://sputnik.freewisdom.org/

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-13 23:22:25

On Mon, Oct 13, 2008 at 1:40 PM, Yuri Takhteyev <qar...@gm...> wrote:
[snip]
>
>>    md.parser._MarkdownParser__processCodeBlock = __hiliteCodeBlock
>
> I think there is an entirely different (and better) way to do this
> now.  Use the standard MarkdownParser, then write a postprocessor to
> modify the eTree.

Don't know why I didn't think of this before. eTree makes is easy. I
just pushed a refactored CodeHilite extension. Much cleaner.

> At the moment, it appears that we don't offer an
> option of modifying the tree before the patterns are run, but we
> should.  I.e., our pipeline should be:
>
> 1. text pre-processors (text-in, text-out) - tempated to drop this
> 2. line pre-processors (line list in, line list out)
> 3. MarkdownParser.parseDocument() - substitute your own if you want
> 4. pre-pattern post-processors (modify the tree before any patterns are run)
> 5. InlineProcessor.applyInlinePatterns()
> 6. etree postprocessors (modify eTree)
> 7. serialization of the etree into a string
> 8. text postprocessors (text-in, text-out)

Why not just make the InlineProcessor be one of the 'postprocessors'
and then extensions can add additional postprocessors either before or
after it as needed?




-- 
----
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] refactoring and treap integration

From: Yuri T. <qar...@gm...> - 2008-10-13 23:35:29

> Why not just make the InlineProcessor be one of the 'postprocessors'
> and then extensions can add additional postprocessors either before or
> after it as needed?

Good point.  If we then also get rid of the
preprocessor/textpreprocessor distinction, we can just reduce it all
to three:

Preprocessor treap: HtmlBlock, Header, Line, Reference
Treeprocessors treap: Inline
Postprocessors treap: Prettify, RawHtml, AndSubstitute

Extensions can then insert processors into one of those three treaps
and also insert patterns into the inline processor.  (Or they can
replace InlineProcessor with their own.)

  - yuri

-- 
http://sputnik.freewisdom.org/

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-14 00:34:07

On Mon, Oct 13, 2008 at 7:35 PM, Yuri Takhteyev <qar...@gm...> wrote:
>> Why not just make the InlineProcessor be one of the 'postprocessors'
>> and then extensions can add additional postprocessors either before or
>> after it as needed?
>
> Good point.  If we then also get rid of the
> preprocessor/textpreprocessor distinction, we can just reduce it all
> to three:
>
> Preprocessor treap: HtmlBlock, Header, Line, Reference
> Treeprocessors treap: Inline
> Postprocessors treap: Prettify, RawHtml, AndSubstitute

Except that Prettify is a Treeprocessor. In any event, I like this
naming much better (pre, tree, post). It's much clearer whats going
on.

>
> Extensions can then insert processors into one of those three treaps
> and also insert patterns into the inline processor.  (Or they can
> replace InlineProcessor with their own.)

...or they can replace/subclass the MarkdownParser.

With this api, someone could use the Markdown engine and rewrite a
completely different markup language. Not that one should, but the
fact that one can is a testament to the api IMO.

--
---
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] refactoring and treap integration

From: Waylan L. <wa...@gm...> - 2008-10-20 14:26:37

On Mon, Oct 13, 2008 at 7:35 PM, Yuri Takhteyev <qar...@gm...> wrote:
>> Why not just make the InlineProcessor be one of the 'postprocessors'
>> and then extensions can add additional postprocessors either before or
>> after it as needed?
>
> Good point.  If we then also get rid of the
> preprocessor/textpreprocessor distinction, we can just reduce it all
> to three:
>

FYI, I just pushed the last of these changes. We now only have three
types of processors:

Preprocessor treap: HtmlBlock, Header, Line, Reference
Treeprocessors treap: Inline, Prettify
Postprocessors treap: RawHtml, AndSubstitute

If anyone has the old TextPostprocessors, or either of the old
postprocessors in your extensions, you'll need to make a few minor
updates for things to work. InlinePatterns should be unaffected - it's
just that now you can manipulate the tree before they are run if you
desire.

I should also mention that all this stuff is fully documented in
docs/writing_extensions.txt. Any improvements, corrections,
suggestions are welcome.

-- 
----
Waylan Limberg
wa...@gm...