From: Yuri T. <qar...@gm...> - 2007-06-09 15:43:04
|
I am sorry I didn't follow up on this thread it. It came at a time when I was super busy and I then didn't get around to going back to it, though it's been on the back of my mind. I am willing to discuss the question of how post and pre-processing is organized, even if some of the solutions are not going to be backwards compatible. I wouldn't want to make such changes on a whim, but we can start thinking of version "2.0", which could potentially be quite different. I am not sure I will attempt to do a radical redesign on my own, but if there are other people interested, we could do it as a community project. Ben, can you send us a more detailed explanation of your proposal? However, if we start talking about a radical change ("2.0"), then i think we also need to talk about a more serious architectural problem, which is the uncomfortable mix of regular expressions and dom trees. The current parser is based on regular expressions, once a regular expression is applied we typically break the string in half, which prevents us from matching later regular expressions. E.g.: we start with "**[foo](x.html)**", and match the link pattern. This gives us a list ["**", DOM_FRAGMENT, "**"]. We now can't match the "**...**" now. I've thought of a few possible solutions for it: 1. Ditch the DOM and just do a bunch of strings-to-strings transformation. This might be the most straigh-forward solution, but very un-pythonic and not something I would be interested in doing personally. 2. Write a special data structure that can behave as a list or tree of DOM fragments while also fitting with the current RE library. One way to do that would be to represent the half-parsed document as a string and a list of DOM nodes, where the string would have placeholders for the DOM nodes. In this case, instead of ["**", DOM_FRAGMENT, "**"] we would have an object with fields str = "**\x0**", doms = [DOM_FRAGMENT]. We could then run doc.str through regular expression, check if any part of the match contains the placeholders, then work out the details. 3. Switch to some other method of parsing. Maybe something from this list: http://nedbatchelder.com/text/python-parsers.html Note that if we go for #3, then the whole preprocessors/postprocessors thing would end up looking very different. - yuri On 6/8/07, Ben Wilson <da...@gm...> wrote: > It's been a while since we discussed this (April), but I thought I'd > come back. I've looked at how PmWiki organizes the various markups as > compared to Markdown. > > In response to my statement that PmWiki had an elegant, ad-hoc method > for adding new markup, Waylan said: "And not very pythonic. I remember > the first time I realized how PmWiki did some very OO like things > without OO code. For PHP it was amazing - > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!" > > I've since taken the time to analyze how Patrick Michaud accomplished > this. Quite simply, he uses a hash-of-hashes to organize markup > relative to other markup (e.g., Strong before Emphasis). At > parse-time, he then passes this H-o-H through a custom heap algorithm > to divine the absolute parse order. I re-implemented his solution in > Python. It is very Pythonic since his custom heap exists in Python's > heapq library. This means the sorting is likely optimized in C. I > think Waylan "failed to see the forest for all of the trees" because > he allowed the confines of PHP to conceal the simple elegance of the > solution. > > He also focused on the big-picture, which was PmWiki, and did not see > the small facet I was focusing on, which was markup management. What > Patrick solved was how to allow a developer simply to insert new > markup into a markup tree. Rather than extend the class, or mess with > the internals of class Markdown, Patrick's solution allows flexibility > in the class. The way Markdown is now, in order for me to add some > behavior I wanted, I had to tinker with Markdown class' internals. > Now, to add markup, all I need to do is tell my parser that I want it > to occur during inline, or even that it must occur before Emphasis. > Thus, for a wiki engine that allows developers to insert/change markup > by plug-in, the process is very OO. There's a reason Patrick is a PhD. > While PHP is inelegant, and Patrick's code is sometimes confusing, I > am constantly amazed at how he solves problems. > > I invite you to consider PmWiki's Markup engine (specifically function > Markup(); and BuildMarkupRules();) The former instructs on how to > extend markup ad-hoc. The latter instructs how to take the resulting > heap and build a parse tree. > > The only problem would be implementing this would not be backward > compatible. But, this is Pythonic as well, as the BDFL willingly > disregards tradition when warranted. It is not backward compatible > because it totally dismisses the present mechanism for ordering > markup. However, I think the gains are worth the cost. > > Warm Regards, > Ben Wilson > > On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote: > > Just wanted to let you guys know that I am reading this, but don't > > have time to think about it seriously and respond this week. However, > > from what I see so far, I think Ben identified a real problem and I > > would love it if you guys could come up with a solution that addresses > > most of the points that have been brought up so far. > > > > Ideally, this solution would maintain backwards compatibility with > > existing extensions. If not, we can still put it in, but we'll have > > to think more carefully of when to release it and whether there should > > be a more general upgrade of how the extension mechanism works. > > (I.e., I think it's ok to change the extension framework once, but not > > every day.) > > > > - yuri > > > > On 4/10/07, Waylan Limberg <wa...@gm...> wrote: > > > > > > > > > Ben Wilson wrote: > > > [snip] > > > > PmWiki has a situation where markups may be added willy-nilly while > > > > maintaining order. It would be rather radical to introduce to > > > > Markdown(). > > > > > > And not very pythonic. I remember the first time I realized how PmWiki > > > did some very OO like things without OO code. For PHP it was amazing - > > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! > > > > > > But if one tried to use PmWiki's approach in python, it would probably > > > be more work than it's worth. A subclass of dict which maintains order > > > or a class wrapping a list of tuples would be much less effort -- and > > > more pythonic. For that matter, it wouldn't all that difficult to build > > > a class from scratch for such a purpose. > > > > > > [snip] > > > > want the conversion to occur before/after/during another item. I > > > > mention PmWiki only because I'm very familiar with its approach and > > > > know its author seeks ease-of-customization. Markdown() generally does > > > > not mean to be as customizable as it follows the Markdown standard > > > > format. > > > > > > Ahh, now I know why your name seemed so familiar. Although I've been out > > > of the (PmWIki) loop for about a year now. It is true that Markdown does not > > > come close to PmWiki. If you're looking for more power, perhaps you > > > should look at reStructuredText [1]. It seems to be the python default > > > for markup, is easily extendable [2], and will output LaTex [3]. > > > Personally, I prefer Markdown for its simplicity, but you seem to want > > > power which brings more complexity. Imo, using an establish markup > > > language (rest) is better than building your own custom creation. > > > > > > [1]: http://docutils.sourceforge.net/rst.html > > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html > > > [3]: http://docutils.sourceforge.net/docs/user/latex.html > > > > > > -- > > > Waylan Limberg > > > wa...@gm... > > > > > > ------------------------------------------------------------------------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > > opinions on IT & business topics through brief surveys-and earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > _______________________________________________ > > > Python-markdown-discuss mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > > > > > -- > > Yuri Takhteyev > > UC Berkeley School of Information > > http://www.freewisdom.org/ > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > opinions on IT & business topics through brief surveys-and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Python-markdown-discuss mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > -- > Ben Wilson > "Words are the only thing which will last forever" Churchill > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |