Thread: [Python-markdown-discuss] Overriding Functions, etc.

Brought to you by: qaramazov, waylanhl

python-markdown-discuss

[Python-markdown-discuss] Overriding Functions, etc.

From: Ben W. <da...@gm...> - 2007-04-10 11:56:41

I'm currently playing with a Python-based flat-file wiki that relies
on Markdown syntax (mostly). This is more for my education as I
understand MoinMoin is the premier wiki-engine for Python and it seems
to support some Markdown markup.[1] So, I've made a few observations
that I'm trying to explain here.

I would like to say the Python-Markdown class is great. I like being
able to extend it. For example, I have a different way of handling the
==== and ---- headings that encourages me to short-circuit the
headings_preprocessor. The reason for my short-circuiting is that I
order headers differently and include ++++ and ~~~~, and the
preprocessor presupposes that a section heading (====) should be
<h1>.[2] I'm able to get both lines of a heading (the title and
markup) by a regex without lookaheads, but it may not be stable.

I would also like to change the output to Tex/LaTeX format if possible.

I was thinking that _perhaps_ the pre-processors could be a dictionary
instead of an array, which would allow deleting by name. Presently, I
set all the preprocessors by copying from the class, then delete the
one preprocessor I don't want. This would also allow use of has_key(),
as the present implementation (correct me if I'm wrong) does not allow
listing or deleting a preprocessor directly.

I'm also having a bit of a problem with the documentation. I'm used to
an "undue experimentation" rule for documentation. This means that a
programmer should be able to look to the documentation alone to
explain _how_ to use the class---such as adding a preprocessor. I
ended up having to look in the class itself to see how it added a
preprocessor. Perhaps the example given could be an actual
implementation of a simple preprocessor?

FWIW, I am adding some non-Markdown syntax that is more expected in
wiki markup (e.g. [[free links]]

Anyway, thanks for a great Markdown class. It moved my development of
my wiki up by about three weekends.

-- 
Ben Wilson
"Words are the only thing which will last forever" Churchill

[1]: Why Python? I am already a strong Perl and PHP programmer, and
I'm trying to extend my language skills.

[2]: The only <h1> I have on the page is the page title. This means
section headings (====) are <h2>; sub-sections (----) <h3>; paragraph
headings (~~~~) is <h4> and sub-paragraph headings (++++) are <h5>. I
never use <h6>. The use of <h1> as a page is based on some older web
standard that's been around for a while that I am never able to find
when I want to cite to it.

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Waylan L. <wa...@gm...> - 2007-04-10 16:38:14

Hi Ben,

Ben Wilson wrote:
> I'm currently playing with a Python-based flat-file wiki that relies
> on Markdown syntax (mostly). This is more for my education as I
> understand MoinMoin is the premier wiki-engine for Python and it seems
> to support some Markdown markup.[1] So, I've made a few observations
> that I'm trying to explain here.
> 
> I would like to say the Python-Markdown class is great. I like being
> able to extend it. For example, I have a different way of handling the
> ==== and ---- headings that encourages me to short-circuit the
> headings_preprocessor. The reason for my short-circuiting is that I
> order headers differently and include ++++ and ~~~~, and the
> preprocessor presupposes that a section heading (====) should be
> <h1>.[2] I'm able to get both lines of a heading (the title and
> markup) by a regex without lookaheads, but it may not be stable.
> 
> I would also like to change the output to Tex/LaTeX format if possible.

I've thought about that as well. Currently, most of the output is stored 
in a minidom instance (the exception being raw htmlblocks). I suppose it 
could be possible to extend the minidom class to output different 
formats, but I haven't looked into it. Perhaps if python-markdown didn't 
use it's own custom minidom class, but one of the more common python dom 
modules, this would be easier (code might already exist). I don't know 
what Yuri's thought about this would be.
> 
> I was thinking that _perhaps_ the pre-processors could be a dictionary
> instead of an array, which would allow deleting by name. Presently, I
> set all the preprocessors by copying from the class, then delete the
> one preprocessor I don't want. This would also allow use of has_key(),
> as the present implementation (correct me if I'm wrong) does not allow
> listing or deleting a preprocessor directly.

This seems like a good idea until we remember that dictionaries do not 
preserve order. In this case, order is very important, as certain 
pre-processors must absolutely be run before others. That's imposable 
with a dictionary. With a little searching you'll find that various 
projects have implemented their own non-standard sorted-dict to address 
this issue, but every implementation is a little different. Another 
possibility could be a list of tuples [(key, value), (key. value)], but 
that can be a pain to work with. That is why a simple list is used. 
Currently is is easy enough to refer to each item by its index, but if 
you're making multiple changes, I can see how that could be problematic.
> 
> I'm also having a bit of a problem with the documentation. I'm used to
> an "undue experimentation" rule for documentation. This means that a
> programmer should be able to look to the documentation alone to
> explain _how_ to use the class---such as adding a preprocessor. I
> ended up having to look in the class itself to see how it added a
> preprocessor. Perhaps the example given could be an actual
> implementation of a simple preprocessor?

Generally the footnote extension is referred to as an example as it uses 
all three methods of adding extensions (pre-precessors, patterns, and 
post-processors).

> 
> FWIW, I am adding some non-Markdown syntax that is more expected in
> wiki markup (e.g. [[free links]]
> 
> Anyway, thanks for a great Markdown class. It moved my development of
> my wiki up by about three weekends.
> 

-- 
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Ben W. <da...@gm...> - 2007-04-10 17:02:11

On 4/10/07, Waylan Limberg <wa...@gm...> wrote:
[...]
> > I was thinking that _perhaps_ the pre-processors could be a dictionary
> > instead of an array, which would allow deleting by name. Presently, I
> > set all the preprocessors by copying from the class, then delete the
> > one preprocessor I don't want. This would also allow use of has_key(),
> > as the present implementation (correct me if I'm wrong) does not allow
> > listing or deleting a preprocessor directly.
>
> This seems like a good idea until we remember that dictionaries do not
> preserve order. In this case, order is very important, as certain
> pre-processors must absolutely be run before others. That's imposable
> with a dictionary. With a little searching you'll find that various
> projects have implemented their own non-standard sorted-dict to address
> this issue, but every implementation is a little different. Another
> possibility could be a list of tuples [(key, value), (key. value)], but
> that can be a pain to work with. That is why a simple list is used.
> Currently is is easy enough to refer to each item by its index, but if
> you're making multiple changes, I can see how that could be problematic.

Right, dict is not sequenced, but there are a couple ways off-hand
that might work.

def PreprocessorInsert(key, val, index=-1):
  self.preprocessors[key] = value
  self.preprocessors_order.insert(key, index)

then when running preprocessors:
   for key in self.preprocessors_order:
       pre = self.preprocessor[key]

PmWiki has a situation where markups may be added willy-nilly while
maintaining order. It would be rather radical to introduce to
Markdown(). I'll try to describe it as best I can as there are two
ways to position markups: by group and relative to other markups.
First, the basic syntax for adding markup is:

Markup('name','phase','regex','substitute')

* Name refers to the key value of the regex, which allows a standard
markup to be overridden by custom markup, and easy identification.
* Phase refers to position either by category (e.g. preprocessor,
inline, postprocessor) or relationally (e.g. '<##' would occur before
'##'; '<inline' would essentially have it occur before all other
inlines. The '##' is the name of the markup it precedes (otherwise the
phase), not the markup itself,)
* Regex is self-explanatory.
* Substitution is self-explanatory, and it can either be text or function.

The code does the shuffling before text conversion. This has the
advantage of not needing to know the sequence per se, only that you
want the conversion to occur before/after/during another item. I
mention PmWiki only because I'm very familiar with its approach and
know its author seeks ease-of-customization. Markdown() generally does
not mean to be as customizable as it follows the Markdown standard
format.

[...]
> > Perhaps the example given could be an actual
> > implementation of a simple preprocessor?
>
> Generally the footnote extension is referred to as an example as it uses
> all three methods of adding extensions (pre-precessors, patterns, and
> post-processors).

Thanks!

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Waylan L. <wa...@gm...> - 2007-04-10 23:16:05

Ben Wilson wrote:
[snip]
> PmWiki has a situation where markups may be added willy-nilly while
> maintaining order. It would be rather radical to introduce to
> Markdown(). 

And not very pythonic. I remember the first time I realized how PmWiki 
did some very OO like things without OO code. For PHP it was amazing - 
and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!

But if one tried to use PmWiki's approach in python, it would probably 
be more work than it's worth. A subclass of dict which maintains order 
or a class wrapping a list of tuples would be much less effort -- and 
more pythonic. For that matter, it wouldn't all that difficult to build 
a class from scratch for such a purpose.

[snip]
> want the conversion to occur before/after/during another item. I
> mention PmWiki only because I'm very familiar with its approach and
> know its author seeks ease-of-customization. Markdown() generally does
> not mean to be as customizable as it follows the Markdown standard
> format.

Ahh, now I know why your name seemed so familiar. Although I've been out
of the (PmWIki) loop for about a year now. It is true that Markdown does not
come close to PmWiki. If you're looking for more power, perhaps you 
should look at reStructuredText [1]. It seems to be the python default 
for markup, is easily extendable [2], and will output LaTex [3]. 
Personally, I prefer Markdown for its simplicity, but you seem to want 
power which brings more complexity. Imo, using an establish markup 
language (rest) is better than building your own custom creation.

[1]: http://docutils.sourceforge.net/rst.html
[2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html
[3]: http://docutils.sourceforge.net/docs/user/latex.html

-- 
Waylan Limberg
wa...@gm...

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Yuri T. <qar...@gm...> - 2007-04-10 23:42:06

Just wanted to let you guys know that I am reading this, but don't
have time to think about it seriously and respond this week.  However,
from what I see so far, I think Ben identified a real problem and I
would love it if you guys could come up with a solution that addresses
most of the points that have been brought up so far.

Ideally, this solution would maintain backwards compatibility with
existing extensions.  If not, we can still put it in, but we'll have
to think more carefully of when to release it and whether there should
be a more general upgrade of how the extension mechanism works.
(I.e., I think it's ok to change the extension framework once, but not
every day.)

  - yuri

On 4/10/07, Waylan Limberg <wa...@gm...> wrote:
>
>
> Ben Wilson wrote:
> [snip]
> > PmWiki has a situation where markups may be added willy-nilly while
> > maintaining order. It would be rather radical to introduce to
> > Markdown().
>
> And not very pythonic. I remember the first time I realized how PmWiki
> did some very OO like things without OO code. For PHP it was amazing -
> and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!
>
> But if one tried to use PmWiki's approach in python, it would probably
> be more work than it's worth. A subclass of dict which maintains order
> or a class wrapping a list of tuples would be much less effort -- and
> more pythonic. For that matter, it wouldn't all that difficult to build
> a class from scratch for such a purpose.
>
> [snip]
> > want the conversion to occur before/after/during another item. I
> > mention PmWiki only because I'm very familiar with its approach and
> > know its author seeks ease-of-customization. Markdown() generally does
> > not mean to be as customizable as it follows the Markdown standard
> > format.
>
> Ahh, now I know why your name seemed so familiar. Although I've been out
> of the (PmWIki) loop for about a year now. It is true that Markdown does not
> come close to PmWiki. If you're looking for more power, perhaps you
> should look at reStructuredText [1]. It seems to be the python default
> for markup, is easily extendable [2], and will output LaTex [3].
> Personally, I prefer Markdown for its simplicity, but you seem to want
> power which brings more complexity. Imo, using an establish markup
> language (rest) is better than building your own custom creation.
>
> [1]: http://docutils.sourceforge.net/rst.html
> [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html
> [3]: http://docutils.sourceforge.net/docs/user/latex.html
>
> --
> Waylan Limberg
> wa...@gm...
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>


-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Ben W. <da...@gm...> - 2007-06-09 01:59:14

It's been a while since we discussed this (April), but I thought I'd
come back. I've looked at how PmWiki organizes the various markups as
compared to Markdown.

In response to my statement that PmWiki had an elegant, ad-hoc method
for adding new markup, Waylan said: "And not very pythonic. I remember
the first time I realized how PmWiki did some very OO like things
without OO code. For PHP it was amazing -
and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!"

I've since taken the time to analyze how Patrick Michaud accomplished
this. Quite simply, he uses a hash-of-hashes to organize markup
relative to other markup (e.g., Strong before Emphasis). At
parse-time, he then passes this H-o-H through a custom heap algorithm
to divine the absolute parse order. I re-implemented his solution in
Python. It is very Pythonic since his custom heap exists in Python's
heapq library. This means the sorting is likely optimized in C. I
think Waylan "failed to see the forest for all of the trees" because
he allowed the confines of PHP to conceal the simple elegance of the
solution.

He also focused on the big-picture, which was PmWiki, and did not see
the small facet I was focusing on, which was markup management. What
Patrick solved was how to allow a developer simply to insert new
markup into a markup tree. Rather than extend the class, or mess with
the internals of class Markdown, Patrick's solution allows flexibility
in the class. The way Markdown is now, in order for me to add some
behavior I wanted, I had to tinker with Markdown class' internals.
Now, to add markup, all I need to do is tell my parser that I want it
to occur during inline, or even that it must occur before Emphasis.
Thus, for a wiki engine that allows developers to insert/change markup
by plug-in, the process is very OO. There's a reason Patrick is a PhD.
While PHP is inelegant, and Patrick's code is sometimes confusing, I
am constantly amazed at how he solves problems.

I invite you to consider PmWiki's Markup engine (specifically function
Markup(); and BuildMarkupRules();) The former instructs on how to
extend markup ad-hoc. The latter instructs how to take the resulting
heap and build a parse tree.

The only problem would be implementing this would not be backward
compatible. But, this is Pythonic as well, as the BDFL willingly
disregards tradition when warranted. It is not backward compatible
because it totally dismisses the present mechanism for ordering
markup. However, I think the gains are worth the cost.

Warm Regards,
Ben Wilson

On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote:
> Just wanted to let you guys know that I am reading this, but don't
> have time to think about it seriously and respond this week.  However,
> from what I see so far, I think Ben identified a real problem and I
> would love it if you guys could come up with a solution that addresses
> most of the points that have been brought up so far.
>
> Ideally, this solution would maintain backwards compatibility with
> existing extensions.  If not, we can still put it in, but we'll have
> to think more carefully of when to release it and whether there should
> be a more general upgrade of how the extension mechanism works.
> (I.e., I think it's ok to change the extension framework once, but not
> every day.)
>
>   - yuri
>
> On 4/10/07, Waylan Limberg <wa...@gm...> wrote:
> >
> >
> > Ben Wilson wrote:
> > [snip]
> > > PmWiki has a situation where markups may be added willy-nilly while
> > > maintaining order. It would be rather radical to introduce to
> > > Markdown().
> >
> > And not very pythonic. I remember the first time I realized how PmWiki
> > did some very OO like things without OO code. For PHP it was amazing -
> > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!
> >
> > But if one tried to use PmWiki's approach in python, it would probably
> > be more work than it's worth. A subclass of dict which maintains order
> > or a class wrapping a list of tuples would be much less effort -- and
> > more pythonic. For that matter, it wouldn't all that difficult to build
> > a class from scratch for such a purpose.
> >
> > [snip]
> > > want the conversion to occur before/after/during another item. I
> > > mention PmWiki only because I'm very familiar with its approach and
> > > know its author seeks ease-of-customization. Markdown() generally does
> > > not mean to be as customizable as it follows the Markdown standard
> > > format.
> >
> > Ahh, now I know why your name seemed so familiar. Although I've been out
> > of the (PmWIki) loop for about a year now. It is true that Markdown does not
> > come close to PmWiki. If you're looking for more power, perhaps you
> > should look at reStructuredText [1]. It seems to be the python default
> > for markup, is easily extendable [2], and will output LaTex [3].
> > Personally, I prefer Markdown for its simplicity, but you seem to want
> > power which brings more complexity. Imo, using an establish markup
> > language (rest) is better than building your own custom creation.
> >
> > [1]: http://docutils.sourceforge.net/rst.html
> > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html
> > [3]: http://docutils.sourceforge.net/docs/user/latex.html
> >
> > --
> > Waylan Limberg
> > wa...@gm...
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > opinions on IT & business topics through brief surveys-and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > _______________________________________________
> > Python-markdown-discuss mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
> >
>
>
> --
> Yuri Takhteyev
> UC Berkeley School of Information
> http://www.freewisdom.org/
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys-and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>

-- 
Ben Wilson
"Words are the only thing which will last forever" Churchill

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Yuri T. <qar...@gm...> - 2007-06-09 15:43:04

I am sorry I didn't follow up on this thread it.  It came at a time
when I was super busy and I then didn't get around to going back to
it, though it's been on the back of my mind.

I am willing to discuss the question of how post and pre-processing is
organized, even if some of the solutions are not going to be backwards
compatible.  I wouldn't want to make such changes on a whim, but we
can start thinking of version "2.0", which could potentially be quite
different.  I am not sure I will attempt to do a radical redesign on
my own, but if there are other people interested, we could do it as a
community project.

Ben, can you send us a more detailed explanation of your proposal?

However, if we start talking about a radical change ("2.0"), then i
think we also need to talk about a more serious architectural problem,
which is the uncomfortable mix of regular expressions and dom trees.
The current parser is based on regular expressions, once a regular
expression is applied we typically break the string in half, which
prevents us from matching later regular expressions.  E.g.: we start
with "**[foo](x.html)**", and match the link pattern.  This gives us a
list ["**", DOM_FRAGMENT, "**"].  We now can't match the "**...**"
now.

I've thought of a few possible solutions for it:

1. Ditch the DOM and just do a bunch of strings-to-strings
transformation.  This might be the most straigh-forward solution, but
very un-pythonic and not something I would be interested in doing
personally.

2. Write a special data structure that can behave as a list or tree of
DOM fragments while also fitting with the current RE library.  One way
to do that would be to represent the half-parsed document as a string
and a list of DOM nodes, where the string would have placeholders for
the DOM nodes.  In this case, instead of ["**", DOM_FRAGMENT, "**"] we
would have an object with fields str = "**\x0**", doms =
[DOM_FRAGMENT].  We could then run doc.str through regular expression,
check if any part of the match contains the placeholders, then work
out the details.

3. Switch to some other method of parsing.  Maybe something from this
list: http://nedbatchelder.com/text/python-parsers.html

Note that if we go for #3, then the whole preprocessors/postprocessors
thing would end up looking very different.

  - yuri

On 6/8/07, Ben Wilson <da...@gm...> wrote:
> It's been a while since we discussed this (April), but I thought I'd
> come back. I've looked at how PmWiki organizes the various markups as
> compared to Markdown.
>
> In response to my statement that PmWiki had an elegant, ad-hoc method
> for adding new markup, Waylan said: "And not very pythonic. I remember
> the first time I realized how PmWiki did some very OO like things
> without OO code. For PHP it was amazing -
> and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!"
>
> I've since taken the time to analyze how Patrick Michaud accomplished
> this. Quite simply, he uses a hash-of-hashes to organize markup
> relative to other markup (e.g., Strong before Emphasis). At
> parse-time, he then passes this H-o-H through a custom heap algorithm
> to divine the absolute parse order. I re-implemented his solution in
> Python. It is very Pythonic since his custom heap exists in Python's
> heapq library. This means the sorting is likely optimized in C. I
> think Waylan "failed to see the forest for all of the trees" because
> he allowed the confines of PHP to conceal the simple elegance of the
> solution.
>
> He also focused on the big-picture, which was PmWiki, and did not see
> the small facet I was focusing on, which was markup management. What
> Patrick solved was how to allow a developer simply to insert new
> markup into a markup tree. Rather than extend the class, or mess with
> the internals of class Markdown, Patrick's solution allows flexibility
> in the class. The way Markdown is now, in order for me to add some
> behavior I wanted, I had to tinker with Markdown class' internals.
> Now, to add markup, all I need to do is tell my parser that I want it
> to occur during inline, or even that it must occur before Emphasis.
> Thus, for a wiki engine that allows developers to insert/change markup
> by plug-in, the process is very OO. There's a reason Patrick is a PhD.
> While PHP is inelegant, and Patrick's code is sometimes confusing, I
> am constantly amazed at how he solves problems.
>
> I invite you to consider PmWiki's Markup engine (specifically function
> Markup(); and BuildMarkupRules();) The former instructs on how to
> extend markup ad-hoc. The latter instructs how to take the resulting
> heap and build a parse tree.
>
> The only problem would be implementing this would not be backward
> compatible. But, this is Pythonic as well, as the BDFL willingly
> disregards tradition when warranted. It is not backward compatible
> because it totally dismisses the present mechanism for ordering
> markup. However, I think the gains are worth the cost.
>
> Warm Regards,
> Ben Wilson
>
> On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote:
> > Just wanted to let you guys know that I am reading this, but don't
> > have time to think about it seriously and respond this week.  However,
> > from what I see so far, I think Ben identified a real problem and I
> > would love it if you guys could come up with a solution that addresses
> > most of the points that have been brought up so far.
> >
> > Ideally, this solution would maintain backwards compatibility with
> > existing extensions.  If not, we can still put it in, but we'll have
> > to think more carefully of when to release it and whether there should
> > be a more general upgrade of how the extension mechanism works.
> > (I.e., I think it's ok to change the extension framework once, but not
> > every day.)
> >
> >   - yuri
> >
> > On 4/10/07, Waylan Limberg <wa...@gm...> wrote:
> > >
> > >
> > > Ben Wilson wrote:
> > > [snip]
> > > > PmWiki has a situation where markups may be added willy-nilly while
> > > > maintaining order. It would be rather radical to introduce to
> > > > Markdown().
> > >
> > > And not very pythonic. I remember the first time I realized how PmWiki
> > > did some very OO like things without OO code. For PHP it was amazing -
> > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!
> > >
> > > But if one tried to use PmWiki's approach in python, it would probably
> > > be more work than it's worth. A subclass of dict which maintains order
> > > or a class wrapping a list of tuples would be much less effort -- and
> > > more pythonic. For that matter, it wouldn't all that difficult to build
> > > a class from scratch for such a purpose.
> > >
> > > [snip]
> > > > want the conversion to occur before/after/during another item. I
> > > > mention PmWiki only because I'm very familiar with its approach and
> > > > know its author seeks ease-of-customization. Markdown() generally does
> > > > not mean to be as customizable as it follows the Markdown standard
> > > > format.
> > >
> > > Ahh, now I know why your name seemed so familiar. Although I've been out
> > > of the (PmWIki) loop for about a year now. It is true that Markdown does not
> > > come close to PmWiki. If you're looking for more power, perhaps you
> > > should look at reStructuredText [1]. It seems to be the python default
> > > for markup, is easily extendable [2], and will output LaTex [3].
> > > Personally, I prefer Markdown for its simplicity, but you seem to want
> > > power which brings more complexity. Imo, using an establish markup
> > > language (rest) is better than building your own custom creation.
> > >
> > > [1]: http://docutils.sourceforge.net/rst.html
> > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html
> > > [3]: http://docutils.sourceforge.net/docs/user/latex.html
> > >
> > > --
> > > Waylan Limberg
> > > wa...@gm...
> > >
> > > -------------------------------------------------------------------------
> > > Take Surveys. Earn Cash. Influence the Future of IT
> > > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > > opinions on IT & business topics through brief surveys-and earn cash
> > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > > _______________________________________________
> > > Python-markdown-discuss mailing list
> > > Pyt...@li...
> > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
> > >
> >
> >
> > --
> > Yuri Takhteyev
> > UC Berkeley School of Information
> > http://www.freewisdom.org/
> >
> > -------------------------------------------------------------------------
> > Take Surveys. Earn Cash. Influence the Future of IT
> > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > opinions on IT & business topics through brief surveys-and earn cash
> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > _______________________________________________
> > Python-markdown-discuss mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
> >
>
>
> --
> Ben Wilson
> "Words are the only thing which will last forever" Churchill
>


-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Ben W. <da...@gm...> - 2007-06-09 20:20:10

I need to modify something I said. There is no need to use Python's
hashq I managed to reduce the sort to a nested array assignment.

On 6/9/07, Yuri Takhteyev <qar...@gm...> wrote:

> Ben, can you send us a more detailed explanation of your proposal?

I propose a different, flexible method for prioritizing markup
processing. This method has no effect on pattern matching/substitution
(i.e., processors); so the DOM method you're using remains intact.
While I personally prefer the string-to-string substitution, what is
proposed is agnostic to how markup is processed. So, what I propose is
a new organizer for the processors, not a new way to process.

For example, preprocessors are ordered in an array: self.preprocessor.
Postprocessors are likewise ordered. There are other similar
"buckets." If I wanted to insert a preprocessor between two standard
preprocessors (e.g. HTML_BLOCK, and LINE_BREAKS), then I have to
manipulate the array.

PmWiki's organizer is flexible. Each processor is named (dictionary or
associative array-based). Each processor announces when it should be
processed: before another processor, after another processor, or
generally within a processor group. For example, if STRONG must occur
before EMPHASIS, then we have:

    p.register('strong','<emphasis,...)

If, alternatively, STRONG must occur _after_ EMPHASIS, then we have:

    p.register('strong','>emphasis,...)

Finally, if we only want STRONG to occur at the same time as other
inline processors, then we have:

    p.register('strong','inline',...)

Replacing a processor is as easy as re-registering it. You can also
deregister a processor.

As we all know, dictionary elements are not ordered. The problem of
order would exist even if dictionaries were ordered. This is because
it is possible to register a new processor at any time before parsing
begins and properly ordering in any language's associative array would
be a royal pain. Patrick provides a solution in his code: have each
processor register its relative order via a heap algorithm. When the
heap is sorted at parse-time, the relationships between various
markups resolves to the final process order.

I believe this organizer is more OO than the current Markdown
implementation. Mind you, I come from a couple decades of functional
programming so my understanding of OO is hard-earned. I believe a
proper class avoids having to manipulate its internal structure.
Having to play with self.preprocessor to add

You would add another class to the Markdown suite: "Parser." This
class would have the following methods: register, deregister, sorted,
and parse. The first two should be self-descriptive. Sorted() would
build an array which properly orders the keys for the registered
processors. Parse would receive the markdowned text and convert that
text into HTML.

None Parser.register(key, where, pattern, replacement, constants(e.g. re.M))

None Parser.deregister(key or [keys])

List Parser.sorted()

html_text Parser.parse(markdown_text)

-- 
Ben Wilson
"Words are the only thing which will last forever" Churchill

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Yuri T. <qar...@gm...> - 2007-06-13 03:40:32

This looks good.  It is however, tied to the question of how the
processors work, so those two issues need to be discussed together.
This implementation assumes that everything is text-in-text-out.
While it is possible to do it this way (that's how Markdown.pl works,
if I remember correctly), I think it will get pretty ugly if we try to
do structural markup this way.  But looking at your code I am starting
to wonder if perhaps the thing to do is to strike a compromise and
work with a tree at the structural level, while using regexp
substitution for the low-level markup.  This way, some handlers can
return text but others can return a tree node:

    "__...__" -> returns "<em>...</em>"
    "## Title" -> returns a tree node for "H2", having applied the remaining
            handlers recursively to the text node of the child.

I will try to think about this more next weekend.

Another thing: Part of your code seems to implement a general
register-deregister-sort logic which would potentially be useful for
things other than markdown.  Have you thought of wrapping it up into a
separate module?  This way inside python-markdown one would simply
use:

    import treeregistry  ## just making up a name for now

    r = treeregistry.Registry()
    r.register('fulltext','>_begin')
    r.register('split','>fulltext')
    ...
    r.register('[[', 'links', r'(\[\[\s*(.*?)\]\])(s?)', make_link)
    load_extension(r)
    processors = r.get_sorted()

Then from here on we just use a list of pre-sorted processors.

  - yuri

On 6/10/07, Ben Wilson <da...@gm...> wrote:
> Yuri,
>
> Here is code demonstrating what I am referring to. I created a file
> called 'src' which contained a snippet of marked up text, which was
> converted into HTML. Perhaps the merits are clearer, and you'll be
> able to adjust Markdown to use this processor organizer. Both are the
> same, but I believe the latter is optimized.
>
> http://dausha.net/parse.py.txt
> http://dausha.net/heap.py.txt
>
> Ben Wilson
>
>
> On 6/9/07, Yuri Takhteyev <qar...@gm...> wrote:
> > I am sorry I didn't follow up on this thread it.  It came at a time
> > when I was super busy and I then didn't get around to going back to
> > it, though it's been on the back of my mind.
> >
> > I am willing to discuss the question of how post and pre-processing is
> > organized, even if some of the solutions are not going to be backwards
> > compatible.  I wouldn't want to make such changes on a whim, but we
> > can start thinking of version "2.0", which could potentially be quite
> > different.  I am not sure I will attempt to do a radical redesign on
> > my own, but if there are other people interested, we could do it as a
> > community project.
> >
> > Ben, can you send us a more detailed explanation of your proposal?
> >
> > However, if we start talking about a radical change ("2.0"), then i
> > think we also need to talk about a more serious architectural problem,
> > which is the uncomfortable mix of regular expressions and dom trees.
> > The current parser is based on regular expressions, once a regular
> > expression is applied we typically break the string in half, which
> > prevents us from matching later regular expressions.  E.g.: we start
> > with "**[foo](x.html)**", and match the link pattern.  This gives us a
> > list ["**", DOM_FRAGMENT, "**"].  We now can't match the "**...**"
> > now.
> >
> > I've thought of a few possible solutions for it:
> >
> > 1. Ditch the DOM and just do a bunch of strings-to-strings
> > transformation.  This might be the most straigh-forward solution, but
> > very un-pythonic and not something I would be interested in doing
> > personally.
> >
> > 2. Write a special data structure that can behave as a list or tree of
> > DOM fragments while also fitting with the current RE library.  One way
> > to do that would be to represent the half-parsed document as a string
> > and a list of DOM nodes, where the string would have placeholders for
> > the DOM nodes.  In this case, instead of ["**", DOM_FRAGMENT, "**"] we
> > would have an object with fields str = "**\x0**", doms =
> > [DOM_FRAGMENT].  We could then run doc.str through regular expression,
> > check if any part of the match contains the placeholders, then work
> > out the details.
> >
> > 3. Switch to some other method of parsing.  Maybe something from this
> > list: http://nedbatchelder.com/text/python-parsers.html
> >
> > Note that if we go for #3, then the whole preprocessors/postprocessors
> > thing would end up looking very different.
> >
> >   - yuri
> >
> > On 6/8/07, Ben Wilson <da...@gm...> wrote:
> > > It's been a while since we discussed this (April), but I thought I'd
> > > come back. I've looked at how PmWiki organizes the various markups as
> > > compared to Markdown.
> > >
> > > In response to my statement that PmWiki had an elegant, ad-hoc method
> > > for adding new markup, Waylan said: "And not very pythonic. I remember
> > > the first time I realized how PmWiki did some very OO like things
> > > without OO code. For PHP it was amazing -
> > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!"
> > >
> > > I've since taken the time to analyze how Patrick Michaud accomplished
> > > this. Quite simply, he uses a hash-of-hashes to organize markup
> > > relative to other markup (e.g., Strong before Emphasis). At
> > > parse-time, he then passes this H-o-H through a custom heap algorithm
> > > to divine the absolute parse order. I re-implemented his solution in
> > > Python. It is very Pythonic since his custom heap exists in Python's
> > > heapq library. This means the sorting is likely optimized in C. I
> > > think Waylan "failed to see the forest for all of the trees" because
> > > he allowed the confines of PHP to conceal the simple elegance of the
> > > solution.
> > >
> > > He also focused on the big-picture, which was PmWiki, and did not see
> > > the small facet I was focusing on, which was markup management. What
> > > Patrick solved was how to allow a developer simply to insert new
> > > markup into a markup tree. Rather than extend the class, or mess with
> > > the internals of class Markdown, Patrick's solution allows flexibility
> > > in the class. The way Markdown is now, in order for me to add some
> > > behavior I wanted, I had to tinker with Markdown class' internals.
> > > Now, to add markup, all I need to do is tell my parser that I want it
> > > to occur during inline, or even that it must occur before Emphasis.
> > > Thus, for a wiki engine that allows developers to insert/change markup
> > > by plug-in, the process is very OO. There's a reason Patrick is a PhD.
> > > While PHP is inelegant, and Patrick's code is sometimes confusing, I
> > > am constantly amazed at how he solves problems.
> > >
> > > I invite you to consider PmWiki's Markup engine (specifically function
> > > Markup(); and BuildMarkupRules();) The former instructs on how to
> > > extend markup ad-hoc. The latter instructs how to take the resulting
> > > heap and build a parse tree.
> > >
> > > The only problem would be implementing this would not be backward
> > > compatible. But, this is Pythonic as well, as the BDFL willingly
> > > disregards tradition when warranted. It is not backward compatible
> > > because it totally dismisses the present mechanism for ordering
> > > markup. However, I think the gains are worth the cost.
> > >
> > > Warm Regards,
> > > Ben Wilson
> > >
> > > On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote:
> > > > Just wanted to let you guys know that I am reading this, but don't
> > > > have time to think about it seriously and respond this week.  However,
> > > > from what I see so far, I think Ben identified a real problem and I
> > > > would love it if you guys could come up with a solution that addresses
> > > > most of the points that have been brought up so far.
> > > >
> > > > Ideally, this solution would maintain backwards compatibility with
> > > > existing extensions.  If not, we can still put it in, but we'll have
> > > > to think more carefully of when to release it and whether there should
> > > > be a more general upgrade of how the extension mechanism works.
> > > > (I.e., I think it's ok to change the extension framework once, but not
> > > > every day.)
> > > >
> > > >   - yuri
> > > >
> > > > On 4/10/07, Waylan Limberg <wa...@gm...> wrote:
> > > > >
> > > > >
> > > > > Ben Wilson wrote:
> > > > > [snip]
> > > > > > PmWiki has a situation where markups may be added willy-nilly while
> > > > > > maintaining order. It would be rather radical to introduce to
> > > > > > Markdown().
> > > > >
> > > > > And not very pythonic. I remember the first time I realized how PmWiki
> > > > > did some very OO like things without OO code. For PHP it was amazing -
> > > > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!
> > > > >
> > > > > But if one tried to use PmWiki's approach in python, it would probably
> > > > > be more work than it's worth. A subclass of dict which maintains order
> > > > > or a class wrapping a list of tuples would be much less effort -- and
> > > > > more pythonic. For that matter, it wouldn't all that difficult to build
> > > > > a class from scratch for such a purpose.
> > > > >
> > > > > [snip]
> > > > > > want the conversion to occur before/after/during another item. I
> > > > > > mention PmWiki only because I'm very familiar with its approach and
> > > > > > know its author seeks ease-of-customization. Markdown() generally does
> > > > > > not mean to be as customizable as it follows the Markdown standard
> > > > > > format.
> > > > >
> > > > > Ahh, now I know why your name seemed so familiar. Although I've been out
> > > > > of the (PmWIki) loop for about a year now. It is true that Markdown does not
> > > > > come close to PmWiki. If you're looking for more power, perhaps you
> > > > > should look at reStructuredText [1]. It seems to be the python default
> > > > > for markup, is easily extendable [2], and will output LaTex [3].
> > > > > Personally, I prefer Markdown for its simplicity, but you seem to want
> > > > > power which brings more complexity. Imo, using an establish markup
> > > > > language (rest) is better than building your own custom creation.
> > > > >
> > > > > [1]: http://docutils.sourceforge.net/rst.html
> > > > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html
> > > > > [3]: http://docutils.sourceforge.net/docs/user/latex.html
> > > > >
> > > > > --
> > > > > Waylan Limberg
> > > > > wa...@gm...
> > > > >
> > > > > -------------------------------------------------------------------------
> > > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > > > > opinions on IT & business topics through brief surveys-and earn cash
> > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > > > > _______________________________________________
> > > > > Python-markdown-discuss mailing list
> > > > > Pyt...@li...
> > > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
> > > > >
> > > >
> > > >
> > > > --
> > > > Yuri Takhteyev
> > > > UC Berkeley School of Information
> > > > http://www.freewisdom.org/
> > > >
> > > > -------------------------------------------------------------------------
> > > > Take Surveys. Earn Cash. Influence the Future of IT
> > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your
> > > > opinions on IT & business topics through brief surveys-and earn cash
> > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> > > > _______________________________________________
> > > > Python-markdown-discuss mailing list
> > > > Pyt...@li...
> > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
> > > >
> > >
> > >
> > > --
> > > Ben Wilson
> > > "Words are the only thing which will last forever" Churchill
> > >
> >
> >
> > --
> > Yuri Takhteyev
> > UC Berkeley School of Information
> > http://www.freewisdom.org/
> >
>
>
> --
> Ben Wilson
> "Words are the only thing which will last forever" Churchill
>


-- 
Yuri Takhteyev
UC Berkeley School of Information
http://www.freewisdom.org/

Re: [Python-markdown-discuss] Overriding Functions, etc.

From: Ben W. <da...@gm...> - 2007-06-13 11:58:12

On 6/12/07, Yuri Takhteyev <qar...@gm...> wrote:
> This looks good.  It is however, tied to the question of how the
> processors work, so those two issues need to be discussed together.
> This implementation assumes that everything is text-in-text-out.

The assumption you cite is based on my implementation of that class.
What I offer is the core premise, of casually linking processor order
and heaping a final order. I realize the TITO is not how Markdown does
things and anticipate that you would make the relevant changes. I'm
familiar enough with how you wrote your implementation, but not enough
to presume to offer a turnkey solution. I snipped out later comments
where you tried to reconcile the difference. That is beyond my scope.
:-) However, based on your later commentary, I've decided to re-tool
what I offered so the resultant tool would be more universally
applicable.

> [...]
> Another thing: Part of your code seems to implement a general
> register-deregister-sort logic which would potentially be useful for
> things other than markdown.  Have you thought of wrapping it up into a
> separate module?

Actually, I have. After I posted the example to you, I noticed it
would be preferable to abstract it out. However, my abstraction was
still coupled to text manipulation. I believe I would remove the
"parse" function.

  This way inside python-markdown one would simply
> use:
>
>     import treeregistry  ## just making up a name for now
>
>     r = treeregistry.Registry()
>     r.register('fulltext','>_begin')
>     r.register('split','>fulltext')
>     ...
>     r.register('[[', 'links', r'(\[\[\s*(.*?)\]\])(s?)', make_link)
>     load_extension(r)
>     processors = r.get_sorted()
>
> Then from here on we just use a list of pre-sorted processors.

FWIW, I would suggest keeping with 'sorted' as the function name as it
is similar to the Python function of the same name. You know what I
mean, but to make sure I'm making my point, I'll explain. Using
.sort(), the list is sorted in place with None returned. Using
.sorted() returns a copy of the list, sorted. The original list
remains unsorted. So, to a Python programmer, r.sorted() should be
self-documenting without the 'get_' prefix.

But, your general point is valid. More importantly, you can get rid of
the regex reference altogether. The code bit below should be close to
how Python Markdown could use it.

    link_processor = (r'\[\[(.*?)\]\]', make_link)
    r.register('[[','links',link_procssor)

Then, the register is absolutely agnostic. Your local use would
extract the ordered list of tuples. Perhaps I could have a local use
that extracted an ordered list of objects. Oh, crap; this just solved
a sort issue I gave up on last Fall!

I'll re-tool the code to be agnostic and post it later today or tomorrow.

-- 
Ben Wilson
"Words are the only thing which will last forever" Churchill