From: Ben W. <da...@gm...> - 2007-04-10 11:56:41
|
I'm currently playing with a Python-based flat-file wiki that relies on Markdown syntax (mostly). This is more for my education as I understand MoinMoin is the premier wiki-engine for Python and it seems to support some Markdown markup.[1] So, I've made a few observations that I'm trying to explain here. I would like to say the Python-Markdown class is great. I like being able to extend it. For example, I have a different way of handling the ==== and ---- headings that encourages me to short-circuit the headings_preprocessor. The reason for my short-circuiting is that I order headers differently and include ++++ and ~~~~, and the preprocessor presupposes that a section heading (====) should be <h1>.[2] I'm able to get both lines of a heading (the title and markup) by a regex without lookaheads, but it may not be stable. I would also like to change the output to Tex/LaTeX format if possible. I was thinking that _perhaps_ the pre-processors could be a dictionary instead of an array, which would allow deleting by name. Presently, I set all the preprocessors by copying from the class, then delete the one preprocessor I don't want. This would also allow use of has_key(), as the present implementation (correct me if I'm wrong) does not allow listing or deleting a preprocessor directly. I'm also having a bit of a problem with the documentation. I'm used to an "undue experimentation" rule for documentation. This means that a programmer should be able to look to the documentation alone to explain _how_ to use the class---such as adding a preprocessor. I ended up having to look in the class itself to see how it added a preprocessor. Perhaps the example given could be an actual implementation of a simple preprocessor? FWIW, I am adding some non-Markdown syntax that is more expected in wiki markup (e.g. [[free links]] Anyway, thanks for a great Markdown class. It moved my development of my wiki up by about three weekends. -- Ben Wilson "Words are the only thing which will last forever" Churchill [1]: Why Python? I am already a strong Perl and PHP programmer, and I'm trying to extend my language skills. [2]: The only <h1> I have on the page is the page title. This means section headings (====) are <h2>; sub-sections (----) <h3>; paragraph headings (~~~~) is <h4> and sub-paragraph headings (++++) are <h5>. I never use <h6>. The use of <h1> as a page is based on some older web standard that's been around for a while that I am never able to find when I want to cite to it. |
From: Waylan L. <wa...@gm...> - 2007-04-10 16:38:14
|
Hi Ben, Ben Wilson wrote: > I'm currently playing with a Python-based flat-file wiki that relies > on Markdown syntax (mostly). This is more for my education as I > understand MoinMoin is the premier wiki-engine for Python and it seems > to support some Markdown markup.[1] So, I've made a few observations > that I'm trying to explain here. > > I would like to say the Python-Markdown class is great. I like being > able to extend it. For example, I have a different way of handling the > ==== and ---- headings that encourages me to short-circuit the > headings_preprocessor. The reason for my short-circuiting is that I > order headers differently and include ++++ and ~~~~, and the > preprocessor presupposes that a section heading (====) should be > <h1>.[2] I'm able to get both lines of a heading (the title and > markup) by a regex without lookaheads, but it may not be stable. > > I would also like to change the output to Tex/LaTeX format if possible. I've thought about that as well. Currently, most of the output is stored in a minidom instance (the exception being raw htmlblocks). I suppose it could be possible to extend the minidom class to output different formats, but I haven't looked into it. Perhaps if python-markdown didn't use it's own custom minidom class, but one of the more common python dom modules, this would be easier (code might already exist). I don't know what Yuri's thought about this would be. > > I was thinking that _perhaps_ the pre-processors could be a dictionary > instead of an array, which would allow deleting by name. Presently, I > set all the preprocessors by copying from the class, then delete the > one preprocessor I don't want. This would also allow use of has_key(), > as the present implementation (correct me if I'm wrong) does not allow > listing or deleting a preprocessor directly. This seems like a good idea until we remember that dictionaries do not preserve order. In this case, order is very important, as certain pre-processors must absolutely be run before others. That's imposable with a dictionary. With a little searching you'll find that various projects have implemented their own non-standard sorted-dict to address this issue, but every implementation is a little different. Another possibility could be a list of tuples [(key, value), (key. value)], but that can be a pain to work with. That is why a simple list is used. Currently is is easy enough to refer to each item by its index, but if you're making multiple changes, I can see how that could be problematic. > > I'm also having a bit of a problem with the documentation. I'm used to > an "undue experimentation" rule for documentation. This means that a > programmer should be able to look to the documentation alone to > explain _how_ to use the class---such as adding a preprocessor. I > ended up having to look in the class itself to see how it added a > preprocessor. Perhaps the example given could be an actual > implementation of a simple preprocessor? Generally the footnote extension is referred to as an example as it uses all three methods of adding extensions (pre-precessors, patterns, and post-processors). > > FWIW, I am adding some non-Markdown syntax that is more expected in > wiki markup (e.g. [[free links]] > > Anyway, thanks for a great Markdown class. It moved my development of > my wiki up by about three weekends. > -- Waylan Limberg wa...@gm... |
From: Ben W. <da...@gm...> - 2007-04-10 17:02:11
|
On 4/10/07, Waylan Limberg <wa...@gm...> wrote: [...] > > I was thinking that _perhaps_ the pre-processors could be a dictionary > > instead of an array, which would allow deleting by name. Presently, I > > set all the preprocessors by copying from the class, then delete the > > one preprocessor I don't want. This would also allow use of has_key(), > > as the present implementation (correct me if I'm wrong) does not allow > > listing or deleting a preprocessor directly. > > This seems like a good idea until we remember that dictionaries do not > preserve order. In this case, order is very important, as certain > pre-processors must absolutely be run before others. That's imposable > with a dictionary. With a little searching you'll find that various > projects have implemented their own non-standard sorted-dict to address > this issue, but every implementation is a little different. Another > possibility could be a list of tuples [(key, value), (key. value)], but > that can be a pain to work with. That is why a simple list is used. > Currently is is easy enough to refer to each item by its index, but if > you're making multiple changes, I can see how that could be problematic. Right, dict is not sequenced, but there are a couple ways off-hand that might work. def PreprocessorInsert(key, val, index=-1): self.preprocessors[key] = value self.preprocessors_order.insert(key, index) then when running preprocessors: for key in self.preprocessors_order: pre = self.preprocessor[key] PmWiki has a situation where markups may be added willy-nilly while maintaining order. It would be rather radical to introduce to Markdown(). I'll try to describe it as best I can as there are two ways to position markups: by group and relative to other markups. First, the basic syntax for adding markup is: Markup('name','phase','regex','substitute') * Name refers to the key value of the regex, which allows a standard markup to be overridden by custom markup, and easy identification. * Phase refers to position either by category (e.g. preprocessor, inline, postprocessor) or relationally (e.g. '<##' would occur before '##'; '<inline' would essentially have it occur before all other inlines. The '##' is the name of the markup it precedes (otherwise the phase), not the markup itself,) * Regex is self-explanatory. * Substitution is self-explanatory, and it can either be text or function. The code does the shuffling before text conversion. This has the advantage of not needing to know the sequence per se, only that you want the conversion to occur before/after/during another item. I mention PmWiki only because I'm very familiar with its approach and know its author seeks ease-of-customization. Markdown() generally does not mean to be as customizable as it follows the Markdown standard format. [...] > > Perhaps the example given could be an actual > > implementation of a simple preprocessor? > > Generally the footnote extension is referred to as an example as it uses > all three methods of adding extensions (pre-precessors, patterns, and > post-processors). Thanks! |
From: Waylan L. <wa...@gm...> - 2007-04-10 23:16:05
|
Ben Wilson wrote: [snip] > PmWiki has a situation where markups may be added willy-nilly while > maintaining order. It would be rather radical to introduce to > Markdown(). And not very pythonic. I remember the first time I realized how PmWiki did some very OO like things without OO code. For PHP it was amazing - and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! But if one tried to use PmWiki's approach in python, it would probably be more work than it's worth. A subclass of dict which maintains order or a class wrapping a list of tuples would be much less effort -- and more pythonic. For that matter, it wouldn't all that difficult to build a class from scratch for such a purpose. [snip] > want the conversion to occur before/after/during another item. I > mention PmWiki only because I'm very familiar with its approach and > know its author seeks ease-of-customization. Markdown() generally does > not mean to be as customizable as it follows the Markdown standard > format. Ahh, now I know why your name seemed so familiar. Although I've been out of the (PmWIki) loop for about a year now. It is true that Markdown does not come close to PmWiki. If you're looking for more power, perhaps you should look at reStructuredText [1]. It seems to be the python default for markup, is easily extendable [2], and will output LaTex [3]. Personally, I prefer Markdown for its simplicity, but you seem to want power which brings more complexity. Imo, using an establish markup language (rest) is better than building your own custom creation. [1]: http://docutils.sourceforge.net/rst.html [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html [3]: http://docutils.sourceforge.net/docs/user/latex.html -- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-04-10 23:42:06
|
Just wanted to let you guys know that I am reading this, but don't have time to think about it seriously and respond this week. However, from what I see so far, I think Ben identified a real problem and I would love it if you guys could come up with a solution that addresses most of the points that have been brought up so far. Ideally, this solution would maintain backwards compatibility with existing extensions. If not, we can still put it in, but we'll have to think more carefully of when to release it and whether there should be a more general upgrade of how the extension mechanism works. (I.e., I think it's ok to change the extension framework once, but not every day.) - yuri On 4/10/07, Waylan Limberg <wa...@gm...> wrote: > > > Ben Wilson wrote: > [snip] > > PmWiki has a situation where markups may be added willy-nilly while > > maintaining order. It would be rather radical to introduce to > > Markdown(). > > And not very pythonic. I remember the first time I realized how PmWiki > did some very OO like things without OO code. For PHP it was amazing - > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! > > But if one tried to use PmWiki's approach in python, it would probably > be more work than it's worth. A subclass of dict which maintains order > or a class wrapping a list of tuples would be much less effort -- and > more pythonic. For that matter, it wouldn't all that difficult to build > a class from scratch for such a purpose. > > [snip] > > want the conversion to occur before/after/during another item. I > > mention PmWiki only because I'm very familiar with its approach and > > know its author seeks ease-of-customization. Markdown() generally does > > not mean to be as customizable as it follows the Markdown standard > > format. > > Ahh, now I know why your name seemed so familiar. Although I've been out > of the (PmWIki) loop for about a year now. It is true that Markdown does not > come close to PmWiki. If you're looking for more power, perhaps you > should look at reStructuredText [1]. It seems to be the python default > for markup, is easily extendable [2], and will output LaTex [3]. > Personally, I prefer Markdown for its simplicity, but you seem to want > power which brings more complexity. Imo, using an establish markup > language (rest) is better than building your own custom creation. > > [1]: http://docutils.sourceforge.net/rst.html > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html > [3]: http://docutils.sourceforge.net/docs/user/latex.html > > -- > Waylan Limberg > wa...@gm... > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |
From: Ben W. <da...@gm...> - 2007-06-09 01:59:14
|
It's been a while since we discussed this (April), but I thought I'd come back. I've looked at how PmWiki organizes the various markups as compared to Markdown. In response to my statement that PmWiki had an elegant, ad-hoc method for adding new markup, Waylan said: "And not very pythonic. I remember the first time I realized how PmWiki did some very OO like things without OO code. For PHP it was amazing - and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!" I've since taken the time to analyze how Patrick Michaud accomplished this. Quite simply, he uses a hash-of-hashes to organize markup relative to other markup (e.g., Strong before Emphasis). At parse-time, he then passes this H-o-H through a custom heap algorithm to divine the absolute parse order. I re-implemented his solution in Python. It is very Pythonic since his custom heap exists in Python's heapq library. This means the sorting is likely optimized in C. I think Waylan "failed to see the forest for all of the trees" because he allowed the confines of PHP to conceal the simple elegance of the solution. He also focused on the big-picture, which was PmWiki, and did not see the small facet I was focusing on, which was markup management. What Patrick solved was how to allow a developer simply to insert new markup into a markup tree. Rather than extend the class, or mess with the internals of class Markdown, Patrick's solution allows flexibility in the class. The way Markdown is now, in order for me to add some behavior I wanted, I had to tinker with Markdown class' internals. Now, to add markup, all I need to do is tell my parser that I want it to occur during inline, or even that it must occur before Emphasis. Thus, for a wiki engine that allows developers to insert/change markup by plug-in, the process is very OO. There's a reason Patrick is a PhD. While PHP is inelegant, and Patrick's code is sometimes confusing, I am constantly amazed at how he solves problems. I invite you to consider PmWiki's Markup engine (specifically function Markup(); and BuildMarkupRules();) The former instructs on how to extend markup ad-hoc. The latter instructs how to take the resulting heap and build a parse tree. The only problem would be implementing this would not be backward compatible. But, this is Pythonic as well, as the BDFL willingly disregards tradition when warranted. It is not backward compatible because it totally dismisses the present mechanism for ordering markup. However, I think the gains are worth the cost. Warm Regards, Ben Wilson On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote: > Just wanted to let you guys know that I am reading this, but don't > have time to think about it seriously and respond this week. However, > from what I see so far, I think Ben identified a real problem and I > would love it if you guys could come up with a solution that addresses > most of the points that have been brought up so far. > > Ideally, this solution would maintain backwards compatibility with > existing extensions. If not, we can still put it in, but we'll have > to think more carefully of when to release it and whether there should > be a more general upgrade of how the extension mechanism works. > (I.e., I think it's ok to change the extension framework once, but not > every day.) > > - yuri > > On 4/10/07, Waylan Limberg <wa...@gm...> wrote: > > > > > > Ben Wilson wrote: > > [snip] > > > PmWiki has a situation where markups may be added willy-nilly while > > > maintaining order. It would be rather radical to introduce to > > > Markdown(). > > > > And not very pythonic. I remember the first time I realized how PmWiki > > did some very OO like things without OO code. For PHP it was amazing - > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! > > > > But if one tried to use PmWiki's approach in python, it would probably > > be more work than it's worth. A subclass of dict which maintains order > > or a class wrapping a list of tuples would be much less effort -- and > > more pythonic. For that matter, it wouldn't all that difficult to build > > a class from scratch for such a purpose. > > > > [snip] > > > want the conversion to occur before/after/during another item. I > > > mention PmWiki only because I'm very familiar with its approach and > > > know its author seeks ease-of-customization. Markdown() generally does > > > not mean to be as customizable as it follows the Markdown standard > > > format. > > > > Ahh, now I know why your name seemed so familiar. Although I've been out > > of the (PmWIki) loop for about a year now. It is true that Markdown does not > > come close to PmWiki. If you're looking for more power, perhaps you > > should look at reStructuredText [1]. It seems to be the python default > > for markup, is easily extendable [2], and will output LaTex [3]. > > Personally, I prefer Markdown for its simplicity, but you seem to want > > power which brings more complexity. Imo, using an establish markup > > language (rest) is better than building your own custom creation. > > > > [1]: http://docutils.sourceforge.net/rst.html > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html > > [3]: http://docutils.sourceforge.net/docs/user/latex.html > > > > -- > > Waylan Limberg > > wa...@gm... > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > opinions on IT & business topics through brief surveys-and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Python-markdown-discuss mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > -- > Yuri Takhteyev > UC Berkeley School of Information > http://www.freewisdom.org/ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Ben Wilson "Words are the only thing which will last forever" Churchill |
From: Yuri T. <qar...@gm...> - 2007-06-09 15:43:04
|
I am sorry I didn't follow up on this thread it. It came at a time when I was super busy and I then didn't get around to going back to it, though it's been on the back of my mind. I am willing to discuss the question of how post and pre-processing is organized, even if some of the solutions are not going to be backwards compatible. I wouldn't want to make such changes on a whim, but we can start thinking of version "2.0", which could potentially be quite different. I am not sure I will attempt to do a radical redesign on my own, but if there are other people interested, we could do it as a community project. Ben, can you send us a more detailed explanation of your proposal? However, if we start talking about a radical change ("2.0"), then i think we also need to talk about a more serious architectural problem, which is the uncomfortable mix of regular expressions and dom trees. The current parser is based on regular expressions, once a regular expression is applied we typically break the string in half, which prevents us from matching later regular expressions. E.g.: we start with "**[foo](x.html)**", and match the link pattern. This gives us a list ["**", DOM_FRAGMENT, "**"]. We now can't match the "**...**" now. I've thought of a few possible solutions for it: 1. Ditch the DOM and just do a bunch of strings-to-strings transformation. This might be the most straigh-forward solution, but very un-pythonic and not something I would be interested in doing personally. 2. Write a special data structure that can behave as a list or tree of DOM fragments while also fitting with the current RE library. One way to do that would be to represent the half-parsed document as a string and a list of DOM nodes, where the string would have placeholders for the DOM nodes. In this case, instead of ["**", DOM_FRAGMENT, "**"] we would have an object with fields str = "**\x0**", doms = [DOM_FRAGMENT]. We could then run doc.str through regular expression, check if any part of the match contains the placeholders, then work out the details. 3. Switch to some other method of parsing. Maybe something from this list: http://nedbatchelder.com/text/python-parsers.html Note that if we go for #3, then the whole preprocessors/postprocessors thing would end up looking very different. - yuri On 6/8/07, Ben Wilson <da...@gm...> wrote: > It's been a while since we discussed this (April), but I thought I'd > come back. I've looked at how PmWiki organizes the various markups as > compared to Markdown. > > In response to my statement that PmWiki had an elegant, ad-hoc method > for adding new markup, Waylan said: "And not very pythonic. I remember > the first time I realized how PmWiki did some very OO like things > without OO code. For PHP it was amazing - > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!" > > I've since taken the time to analyze how Patrick Michaud accomplished > this. Quite simply, he uses a hash-of-hashes to organize markup > relative to other markup (e.g., Strong before Emphasis). At > parse-time, he then passes this H-o-H through a custom heap algorithm > to divine the absolute parse order. I re-implemented his solution in > Python. It is very Pythonic since his custom heap exists in Python's > heapq library. This means the sorting is likely optimized in C. I > think Waylan "failed to see the forest for all of the trees" because > he allowed the confines of PHP to conceal the simple elegance of the > solution. > > He also focused on the big-picture, which was PmWiki, and did not see > the small facet I was focusing on, which was markup management. What > Patrick solved was how to allow a developer simply to insert new > markup into a markup tree. Rather than extend the class, or mess with > the internals of class Markdown, Patrick's solution allows flexibility > in the class. The way Markdown is now, in order for me to add some > behavior I wanted, I had to tinker with Markdown class' internals. > Now, to add markup, all I need to do is tell my parser that I want it > to occur during inline, or even that it must occur before Emphasis. > Thus, for a wiki engine that allows developers to insert/change markup > by plug-in, the process is very OO. There's a reason Patrick is a PhD. > While PHP is inelegant, and Patrick's code is sometimes confusing, I > am constantly amazed at how he solves problems. > > I invite you to consider PmWiki's Markup engine (specifically function > Markup(); and BuildMarkupRules();) The former instructs on how to > extend markup ad-hoc. The latter instructs how to take the resulting > heap and build a parse tree. > > The only problem would be implementing this would not be backward > compatible. But, this is Pythonic as well, as the BDFL willingly > disregards tradition when warranted. It is not backward compatible > because it totally dismisses the present mechanism for ordering > markup. However, I think the gains are worth the cost. > > Warm Regards, > Ben Wilson > > On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote: > > Just wanted to let you guys know that I am reading this, but don't > > have time to think about it seriously and respond this week. However, > > from what I see so far, I think Ben identified a real problem and I > > would love it if you guys could come up with a solution that addresses > > most of the points that have been brought up so far. > > > > Ideally, this solution would maintain backwards compatibility with > > existing extensions. If not, we can still put it in, but we'll have > > to think more carefully of when to release it and whether there should > > be a more general upgrade of how the extension mechanism works. > > (I.e., I think it's ok to change the extension framework once, but not > > every day.) > > > > - yuri > > > > On 4/10/07, Waylan Limberg <wa...@gm...> wrote: > > > > > > > > > Ben Wilson wrote: > > > [snip] > > > > PmWiki has a situation where markups may be added willy-nilly while > > > > maintaining order. It would be rather radical to introduce to > > > > Markdown(). > > > > > > And not very pythonic. I remember the first time I realized how PmWiki > > > did some very OO like things without OO code. For PHP it was amazing - > > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! > > > > > > But if one tried to use PmWiki's approach in python, it would probably > > > be more work than it's worth. A subclass of dict which maintains order > > > or a class wrapping a list of tuples would be much less effort -- and > > > more pythonic. For that matter, it wouldn't all that difficult to build > > > a class from scratch for such a purpose. > > > > > > [snip] > > > > want the conversion to occur before/after/during another item. I > > > > mention PmWiki only because I'm very familiar with its approach and > > > > know its author seeks ease-of-customization. Markdown() generally does > > > > not mean to be as customizable as it follows the Markdown standard > > > > format. > > > > > > Ahh, now I know why your name seemed so familiar. Although I've been out > > > of the (PmWIki) loop for about a year now. It is true that Markdown does not > > > come close to PmWiki. If you're looking for more power, perhaps you > > > should look at reStructuredText [1]. It seems to be the python default > > > for markup, is easily extendable [2], and will output LaTex [3]. > > > Personally, I prefer Markdown for its simplicity, but you seem to want > > > power which brings more complexity. Imo, using an establish markup > > > language (rest) is better than building your own custom creation. > > > > > > [1]: http://docutils.sourceforge.net/rst.html > > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html > > > [3]: http://docutils.sourceforge.net/docs/user/latex.html > > > > > > -- > > > Waylan Limberg > > > wa...@gm... > > > > > > ------------------------------------------------------------------------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > > opinions on IT & business topics through brief surveys-and earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > _______________________________________________ > > > Python-markdown-discuss mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > > > > > -- > > Yuri Takhteyev > > UC Berkeley School of Information > > http://www.freewisdom.org/ > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > opinions on IT & business topics through brief surveys-and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > Python-markdown-discuss mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > -- > Ben Wilson > "Words are the only thing which will last forever" Churchill > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |
From: Ben W. <da...@gm...> - 2007-06-09 20:20:10
|
I need to modify something I said. There is no need to use Python's hashq I managed to reduce the sort to a nested array assignment. On 6/9/07, Yuri Takhteyev <qar...@gm...> wrote: > Ben, can you send us a more detailed explanation of your proposal? I propose a different, flexible method for prioritizing markup processing. This method has no effect on pattern matching/substitution (i.e., processors); so the DOM method you're using remains intact. While I personally prefer the string-to-string substitution, what is proposed is agnostic to how markup is processed. So, what I propose is a new organizer for the processors, not a new way to process. For example, preprocessors are ordered in an array: self.preprocessor. Postprocessors are likewise ordered. There are other similar "buckets." If I wanted to insert a preprocessor between two standard preprocessors (e.g. HTML_BLOCK, and LINE_BREAKS), then I have to manipulate the array. PmWiki's organizer is flexible. Each processor is named (dictionary or associative array-based). Each processor announces when it should be processed: before another processor, after another processor, or generally within a processor group. For example, if STRONG must occur before EMPHASIS, then we have: p.register('strong','<emphasis,...) If, alternatively, STRONG must occur _after_ EMPHASIS, then we have: p.register('strong','>emphasis,...) Finally, if we only want STRONG to occur at the same time as other inline processors, then we have: p.register('strong','inline',...) Replacing a processor is as easy as re-registering it. You can also deregister a processor. As we all know, dictionary elements are not ordered. The problem of order would exist even if dictionaries were ordered. This is because it is possible to register a new processor at any time before parsing begins and properly ordering in any language's associative array would be a royal pain. Patrick provides a solution in his code: have each processor register its relative order via a heap algorithm. When the heap is sorted at parse-time, the relationships between various markups resolves to the final process order. I believe this organizer is more OO than the current Markdown implementation. Mind you, I come from a couple decades of functional programming so my understanding of OO is hard-earned. I believe a proper class avoids having to manipulate its internal structure. Having to play with self.preprocessor to add You would add another class to the Markdown suite: "Parser." This class would have the following methods: register, deregister, sorted, and parse. The first two should be self-descriptive. Sorted() would build an array which properly orders the keys for the registered processors. Parse would receive the markdowned text and convert that text into HTML. None Parser.register(key, where, pattern, replacement, constants(e.g. re.M)) None Parser.deregister(key or [keys]) List Parser.sorted() html_text Parser.parse(markdown_text) -- Ben Wilson "Words are the only thing which will last forever" Churchill |
From: Yuri T. <qar...@gm...> - 2007-06-13 03:40:32
|
This looks good. It is however, tied to the question of how the processors work, so those two issues need to be discussed together. This implementation assumes that everything is text-in-text-out. While it is possible to do it this way (that's how Markdown.pl works, if I remember correctly), I think it will get pretty ugly if we try to do structural markup this way. But looking at your code I am starting to wonder if perhaps the thing to do is to strike a compromise and work with a tree at the structural level, while using regexp substitution for the low-level markup. This way, some handlers can return text but others can return a tree node: "__...__" -> returns "<em>...</em>" "## Title" -> returns a tree node for "H2", having applied the remaining handlers recursively to the text node of the child. I will try to think about this more next weekend. Another thing: Part of your code seems to implement a general register-deregister-sort logic which would potentially be useful for things other than markdown. Have you thought of wrapping it up into a separate module? This way inside python-markdown one would simply use: import treeregistry ## just making up a name for now r = treeregistry.Registry() r.register('fulltext','>_begin') r.register('split','>fulltext') ... r.register('[[', 'links', r'(\[\[\s*(.*?)\]\])(s?)', make_link) load_extension(r) processors = r.get_sorted() Then from here on we just use a list of pre-sorted processors. - yuri On 6/10/07, Ben Wilson <da...@gm...> wrote: > Yuri, > > Here is code demonstrating what I am referring to. I created a file > called 'src' which contained a snippet of marked up text, which was > converted into HTML. Perhaps the merits are clearer, and you'll be > able to adjust Markdown to use this processor organizer. Both are the > same, but I believe the latter is optimized. > > http://dausha.net/parse.py.txt > http://dausha.net/heap.py.txt > > Ben Wilson > > > On 6/9/07, Yuri Takhteyev <qar...@gm...> wrote: > > I am sorry I didn't follow up on this thread it. It came at a time > > when I was super busy and I then didn't get around to going back to > > it, though it's been on the back of my mind. > > > > I am willing to discuss the question of how post and pre-processing is > > organized, even if some of the solutions are not going to be backwards > > compatible. I wouldn't want to make such changes on a whim, but we > > can start thinking of version "2.0", which could potentially be quite > > different. I am not sure I will attempt to do a radical redesign on > > my own, but if there are other people interested, we could do it as a > > community project. > > > > Ben, can you send us a more detailed explanation of your proposal? > > > > However, if we start talking about a radical change ("2.0"), then i > > think we also need to talk about a more serious architectural problem, > > which is the uncomfortable mix of regular expressions and dom trees. > > The current parser is based on regular expressions, once a regular > > expression is applied we typically break the string in half, which > > prevents us from matching later regular expressions. E.g.: we start > > with "**[foo](x.html)**", and match the link pattern. This gives us a > > list ["**", DOM_FRAGMENT, "**"]. We now can't match the "**...**" > > now. > > > > I've thought of a few possible solutions for it: > > > > 1. Ditch the DOM and just do a bunch of strings-to-strings > > transformation. This might be the most straigh-forward solution, but > > very un-pythonic and not something I would be interested in doing > > personally. > > > > 2. Write a special data structure that can behave as a list or tree of > > DOM fragments while also fitting with the current RE library. One way > > to do that would be to represent the half-parsed document as a string > > and a list of DOM nodes, where the string would have placeholders for > > the DOM nodes. In this case, instead of ["**", DOM_FRAGMENT, "**"] we > > would have an object with fields str = "**\x0**", doms = > > [DOM_FRAGMENT]. We could then run doc.str through regular expression, > > check if any part of the match contains the placeholders, then work > > out the details. > > > > 3. Switch to some other method of parsing. Maybe something from this > > list: http://nedbatchelder.com/text/python-parsers.html > > > > Note that if we go for #3, then the whole preprocessors/postprocessors > > thing would end up looking very different. > > > > - yuri > > > > On 6/8/07, Ben Wilson <da...@gm...> wrote: > > > It's been a while since we discussed this (April), but I thought I'd > > > come back. I've looked at how PmWiki organizes the various markups as > > > compared to Markdown. > > > > > > In response to my statement that PmWiki had an elegant, ad-hoc method > > > for adding new markup, Waylan said: "And not very pythonic. I remember > > > the first time I realized how PmWiki did some very OO like things > > > without OO code. For PHP it was amazing - > > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg!" > > > > > > I've since taken the time to analyze how Patrick Michaud accomplished > > > this. Quite simply, he uses a hash-of-hashes to organize markup > > > relative to other markup (e.g., Strong before Emphasis). At > > > parse-time, he then passes this H-o-H through a custom heap algorithm > > > to divine the absolute parse order. I re-implemented his solution in > > > Python. It is very Pythonic since his custom heap exists in Python's > > > heapq library. This means the sorting is likely optimized in C. I > > > think Waylan "failed to see the forest for all of the trees" because > > > he allowed the confines of PHP to conceal the simple elegance of the > > > solution. > > > > > > He also focused on the big-picture, which was PmWiki, and did not see > > > the small facet I was focusing on, which was markup management. What > > > Patrick solved was how to allow a developer simply to insert new > > > markup into a markup tree. Rather than extend the class, or mess with > > > the internals of class Markdown, Patrick's solution allows flexibility > > > in the class. The way Markdown is now, in order for me to add some > > > behavior I wanted, I had to tinker with Markdown class' internals. > > > Now, to add markup, all I need to do is tell my parser that I want it > > > to occur during inline, or even that it must occur before Emphasis. > > > Thus, for a wiki engine that allows developers to insert/change markup > > > by plug-in, the process is very OO. There's a reason Patrick is a PhD. > > > While PHP is inelegant, and Patrick's code is sometimes confusing, I > > > am constantly amazed at how he solves problems. > > > > > > I invite you to consider PmWiki's Markup engine (specifically function > > > Markup(); and BuildMarkupRules();) The former instructs on how to > > > extend markup ad-hoc. The latter instructs how to take the resulting > > > heap and build a parse tree. > > > > > > The only problem would be implementing this would not be backward > > > compatible. But, this is Pythonic as well, as the BDFL willingly > > > disregards tradition when warranted. It is not backward compatible > > > because it totally dismisses the present mechanism for ordering > > > markup. However, I think the gains are worth the cost. > > > > > > Warm Regards, > > > Ben Wilson > > > > > > On 4/10/07, Yuri Takhteyev <qar...@gm...> wrote: > > > > Just wanted to let you guys know that I am reading this, but don't > > > > have time to think about it seriously and respond this week. However, > > > > from what I see so far, I think Ben identified a real problem and I > > > > would love it if you guys could come up with a solution that addresses > > > > most of the points that have been brought up so far. > > > > > > > > Ideally, this solution would maintain backwards compatibility with > > > > existing extensions. If not, we can still put it in, but we'll have > > > > to think more carefully of when to release it and whether there should > > > > be a more general upgrade of how the extension mechanism works. > > > > (I.e., I think it's ok to change the extension framework once, but not > > > > every day.) > > > > > > > > - yuri > > > > > > > > On 4/10/07, Waylan Limberg <wa...@gm...> wrote: > > > > > > > > > > > > > > > Ben Wilson wrote: > > > > > [snip] > > > > > > PmWiki has a situation where markups may be added willy-nilly while > > > > > > maintaining order. It would be rather radical to introduce to > > > > > > Markdown(). > > > > > > > > > > And not very pythonic. I remember the first time I realized how PmWiki > > > > > did some very OO like things without OO code. For PHP it was amazing - > > > > > and a pleasure to work with. Especially considering PHP's OO sytax. Uhg! > > > > > > > > > > But if one tried to use PmWiki's approach in python, it would probably > > > > > be more work than it's worth. A subclass of dict which maintains order > > > > > or a class wrapping a list of tuples would be much less effort -- and > > > > > more pythonic. For that matter, it wouldn't all that difficult to build > > > > > a class from scratch for such a purpose. > > > > > > > > > > [snip] > > > > > > want the conversion to occur before/after/during another item. I > > > > > > mention PmWiki only because I'm very familiar with its approach and > > > > > > know its author seeks ease-of-customization. Markdown() generally does > > > > > > not mean to be as customizable as it follows the Markdown standard > > > > > > format. > > > > > > > > > > Ahh, now I know why your name seemed so familiar. Although I've been out > > > > > of the (PmWIki) loop for about a year now. It is true that Markdown does not > > > > > come close to PmWiki. If you're looking for more power, perhaps you > > > > > should look at reStructuredText [1]. It seems to be the python default > > > > > for markup, is easily extendable [2], and will output LaTex [3]. > > > > > Personally, I prefer Markdown for its simplicity, but you seem to want > > > > > power which brings more complexity. Imo, using an establish markup > > > > > language (rest) is better than building your own custom creation. > > > > > > > > > > [1]: http://docutils.sourceforge.net/rst.html > > > > > [2]: http://docutils.sourceforge.net/docs/howto/rst-directives.html > > > > > [3]: http://docutils.sourceforge.net/docs/user/latex.html > > > > > > > > > > -- > > > > > Waylan Limberg > > > > > wa...@gm... > > > > > > > > > > ------------------------------------------------------------------------- > > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > > > > opinions on IT & business topics through brief surveys-and earn cash > > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > > _______________________________________________ > > > > > Python-markdown-discuss mailing list > > > > > Pyt...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > > > > > > > > > > > > > -- > > > > Yuri Takhteyev > > > > UC Berkeley School of Information > > > > http://www.freewisdom.org/ > > > > > > > > ------------------------------------------------------------------------- > > > > Take Surveys. Earn Cash. Influence the Future of IT > > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > > > opinions on IT & business topics through brief surveys-and earn cash > > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > > _______________________________________________ > > > > Python-markdown-discuss mailing list > > > > Pyt...@li... > > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > > > > > > > > > -- > > > Ben Wilson > > > "Words are the only thing which will last forever" Churchill > > > > > > > > > -- > > Yuri Takhteyev > > UC Berkeley School of Information > > http://www.freewisdom.org/ > > > > > -- > Ben Wilson > "Words are the only thing which will last forever" Churchill > -- Yuri Takhteyev UC Berkeley School of Information http://www.freewisdom.org/ |
From: Ben W. <da...@gm...> - 2007-06-13 11:58:12
|
On 6/12/07, Yuri Takhteyev <qar...@gm...> wrote: > This looks good. It is however, tied to the question of how the > processors work, so those two issues need to be discussed together. > This implementation assumes that everything is text-in-text-out. The assumption you cite is based on my implementation of that class. What I offer is the core premise, of casually linking processor order and heaping a final order. I realize the TITO is not how Markdown does things and anticipate that you would make the relevant changes. I'm familiar enough with how you wrote your implementation, but not enough to presume to offer a turnkey solution. I snipped out later comments where you tried to reconcile the difference. That is beyond my scope. :-) However, based on your later commentary, I've decided to re-tool what I offered so the resultant tool would be more universally applicable. > [...] > Another thing: Part of your code seems to implement a general > register-deregister-sort logic which would potentially be useful for > things other than markdown. Have you thought of wrapping it up into a > separate module? Actually, I have. After I posted the example to you, I noticed it would be preferable to abstract it out. However, my abstraction was still coupled to text manipulation. I believe I would remove the "parse" function. This way inside python-markdown one would simply > use: > > import treeregistry ## just making up a name for now > > r = treeregistry.Registry() > r.register('fulltext','>_begin') > r.register('split','>fulltext') > ... > r.register('[[', 'links', r'(\[\[\s*(.*?)\]\])(s?)', make_link) > load_extension(r) > processors = r.get_sorted() > > Then from here on we just use a list of pre-sorted processors. FWIW, I would suggest keeping with 'sorted' as the function name as it is similar to the Python function of the same name. You know what I mean, but to make sure I'm making my point, I'll explain. Using .sort(), the list is sorted in place with None returned. Using .sorted() returns a copy of the list, sorted. The original list remains unsorted. So, to a Python programmer, r.sorted() should be self-documenting without the 'get_' prefix. But, your general point is valid. More importantly, you can get rid of the regex reference altogether. The code bit below should be close to how Python Markdown could use it. link_processor = (r'\[\[(.*?)\]\]', make_link) r.register('[[','links',link_procssor) Then, the register is absolutely agnostic. Your local use would extract the ordered list of tuples. Perhaps I could have a local use that extracted an ordered list of objects. Oh, crap; this just solved a sort issue I gave up on last Fall! I'll re-tool the code to be agnostic and post it later today or tomorrow. -- Ben Wilson "Words are the only thing which will last forever" Churchill |