You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(14) |
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(7) |
Apr
(6) |
May
(25) |
Jun
(11) |
Jul
|
Aug
(5) |
Sep
(5) |
Oct
(39) |
Nov
(28) |
Dec
(6) |
2008 |
Jan
(4) |
Feb
(39) |
Mar
(14) |
Apr
(12) |
May
(14) |
Jun
(20) |
Jul
(60) |
Aug
(69) |
Sep
(20) |
Oct
(56) |
Nov
(41) |
Dec
(29) |
2009 |
Jan
(27) |
Feb
(21) |
Mar
(37) |
Apr
(18) |
May
(2) |
Jun
(6) |
Jul
(6) |
Aug
(5) |
Sep
(2) |
Oct
(12) |
Nov
(2) |
Dec
|
2010 |
Jan
(12) |
Feb
(13) |
Mar
(10) |
Apr
|
May
(6) |
Jun
(5) |
Jul
(10) |
Aug
(7) |
Sep
(8) |
Oct
(7) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
(6) |
Apr
(5) |
May
(6) |
Jun
(15) |
Jul
(2) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(5) |
2012 |
Jan
(6) |
Feb
|
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
(1) |
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(20) |
2013 |
Jan
|
Feb
|
Mar
(5) |
Apr
(1) |
May
(1) |
Jun
(9) |
Jul
(3) |
Aug
(5) |
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2014 |
Jan
(10) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(12) |
Sep
(9) |
Oct
(4) |
Nov
(8) |
Dec
(2) |
2015 |
Jan
(5) |
Feb
(5) |
Mar
(1) |
Apr
(1) |
May
(3) |
Jun
|
Jul
|
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(2) |
Feb
(2) |
Mar
(9) |
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
(1) |
2017 |
Jan
(9) |
Feb
|
Mar
(3) |
Apr
|
May
(14) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(9) |
2019 |
Jan
(4) |
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
From: Waylan L. <wa...@gm...> - 2008-02-21 21:29:48
|
On Thu, Feb 21, 2008 at 3:58 PM, David Wolever <wo...@cs...> wrote: > On 21-Feb-08, at 3:34 PM, Waylan Limberg wrote: > > On Thu, Feb 21, 2008 at 2:33 PM, David Wolever > > <wo...@cs...> wrote: > >> At the moment, a list extension names is passed to Markdown(), and > >> the Markdown class is responsible for loading them. > >> This makes it harder to pragmatically load extensions. > > Harder in what way? > At the moment, you've got to put your extension in a module called > mdx_eggs. Ok, so your trying to remove the requirement that each extension has to be in it's own file. I supose this would make a few things easier. > >> I have written a patch (attached) so that a list of extension > >> modules > >> will be passed to Markdown(). > > One concern I have it that, unless I missed it, you have completely > > removed the `extension_configs` arg. You only allow key=value pairs > > separated by commas. The `extension_configs` arg makes it possible to > > easily set configs programicly or even pass in complex python data > > structures. > The assumption, which I guess I didn't make clear, what that the > extensions passed to Markdown were "ready to go" -- they are just > waiting for a call to extendMarkdown(). > > So, if you were to be using the abbreviation extension, the code > would look something like this: > abbr = __import__("abbr") > abbr = abbr.makeExtension(("abbrs", (("RAM", "random access memory"), > ("SSH", "secure shell")))) > Markdown(extensions = [abbr]).convert("I use SSH to check my free RAM") > > Alternately, the load_extension function could be modified to take a > "config" parameter... I would prefer to see the config parameter. There are people who will only ever use the extensions with no need to understand how they work. Seeing we offered a arg before, lets keep it, or something similar, if we can. -- ---- Waylan Limberg wa...@gm... |
From: David W. <wo...@cs...> - 2008-02-21 20:59:12
|
On 21-Feb-08, at 3:34 PM, Waylan Limberg wrote: > On Thu, Feb 21, 2008 at 2:33 PM, David Wolever > <wo...@cs...> wrote: >> At the moment, a list extension names is passed to Markdown(), and >> the Markdown class is responsible for loading them. >> This makes it harder to pragmatically load extensions. > Harder in what way? At the moment, you've got to put your extension in a module called mdx_eggs. In the case of DrProject (and, I believe, Trac... But there are doubtless others), each component provides its own syntax (among other things), so it is preferable to have a loop like: for provider in wiki_syntax_providers: extensions.append(provider.get_wiki_syntax()) where get_wiki_syntax() returns an object which is passed directly to Markdown: Markdown(extensions=extensions).convert("...") >> I have written a patch (attached) so that a list of extension >> modules >> will be passed to Markdown(). > One concern I have it that, unless I missed it, you have completely > removed the `extension_configs` arg. You only allow key=value pairs > separated by commas. The `extension_configs` arg makes it possible to > easily set configs programicly or even pass in complex python data > structures. The assumption, which I guess I didn't make clear, what that the extensions passed to Markdown were "ready to go" -- they are just waiting for a call to extendMarkdown(). So, if you were to be using the abbreviation extension, the code would look something like this: abbr = __import__("abbr") abbr = abbr.makeExtension(("abbrs", (("RAM", "random access memory"), ("SSH", "secure shell")))) Markdown(extensions = [abbr]).convert("I use SSH to check my free RAM") Alternately, the load_extension function could be modified to take a "config" parameter... |
From: Waylan L. <wa...@gm...> - 2008-02-21 20:34:38
|
David, On Thu, Feb 21, 2008 at 2:33 PM, David Wolever <wo...@cs...> wrote: > At the moment, a list extension names is passed to Markdown(), and > the Markdown class is responsible for loading them. > This makes it harder to pragmatically load extensions. Harder in what way? I guess I'm not seeing the problem your trying to solve. Before we go and break the api and people have to rewrite all their extensions (we've had indications in the past that some people are using extensions they've never made public) I'd like to understand why, that's all. > > I have written a patch (attached) so that a list of extension modules > will be passed to Markdown(). One concern I have it that, unless I missed it, you have completely removed the `extension_configs` arg. You only allow key=value pairs separated by commas. The `extension_configs` arg makes it possible to easily set configs programicly or even pass in complex python data structures. Consider the Abbreviation Extension[1]. It accepts a dictionary of abbreviation definitions through the `extension_configs` arg among other ways to define abbreviations. One use case for that is that perhaps a project stores a list of abbreviations in a database or some other non flat-file format. That project's code could then gather those definitions into a dict and pass them in. Perhaps each user has a different list of definitions. Or sometimes the same abbreviation could have different meanings in different contexts, so the list is adjusted based upon the category of the document. You get the idea. I'd like to keep that ability. Although, that is my extension so I'm partial. [1]: http://achinghead.com/markdown/abbr/ > When running from the command line, extensions will be loaded as the > command line arguments are parsed (through the new load_extension > function). > > This will break the Markdown interface, though, so I haven't > committed the code. > > What do you think? > > David > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > -- ---- Waylan Limberg wa...@gm... |
From: David W. <wo...@cs...> - 2008-02-21 20:21:01
|
Like, for backwards compatibility? Yup, that's certainly a possibility. If there is general approval, I'll code it up. On 21-Feb-08, at 3:17 PM, Yuri Takhteyev wrote: > Can't we check for type of parameters to see whether they are names or > loaded modules? > > - yuri > > On Thu, Feb 21, 2008 at 11:33 AM, David Wolever > <wo...@cs...> wrote: >> At the moment, a list extension names is passed to Markdown(), and >> the Markdown class is responsible for loading them. >> This makes it harder to pragmatically load extensions. >> >> I have written a patch (attached) so that a list of extension >> modules >> will be passed to Markdown(). |
From: David W. <wo...@cs...> - 2008-02-21 19:33:58
|
At the moment, a list extension names is passed to Markdown(), and the Markdown class is responsible for loading them. This makes it harder to pragmatically load extensions. I have written a patch (attached) so that a list of extension modules will be passed to Markdown(). When running from the command line, extensions will be loaded as the command line arguments are parsed (through the new load_extension function). This will break the Markdown interface, though, so I haven't committed the code. What do you think? David |
From: Waylan L. <wa...@gm...> - 2008-02-19 21:30:01
|
After posting this last night I spent some time playing with an idea I had. What if the inlinepatterns had two stages? In the first stage, the regex was run against the text and any resulting matches are stored for later retrieval. Throughout this process the text remains a single string. Then, only after all the patterns have run and all the matches found do we modify the string by looping through the matches and call the handleMatch method of each pattern. The result is here [1]. It doesn't currently handle nesting well (or at all), but that should be fairly easy, the api for storage is ugly (really ugly) and it's probably terribly slow. I'm also not using the dom, but that should be easy to change. It certainly is not ready for public consumption. But what do you think? It is worth further efforts? What I find most compelling is that with dom support added back in, this continues to support the current extension api. In fact, there should be little to no need for adjustments to existing extensions In any event, there are some other good ideas here. Perhaps with a little from everyone, we'll see something that works. And I'm half inclined to drop the dom from inlinepatterns as well. [1]: https://code.achinghead.com/browser/md_branches/inlinePatterns/patterns.py -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-02-19 18:17:45
|
> The nice property that you lose here is that you can't guarantee you'll always > generate valid html/xml. Of course, you might not care about that, since > Markdown will include any old stuff from the user, but if you cared, using dom > trees gives you that guarantee. Well, as you noted, since Markdown allows you to insert anything that vaguely looks like html, you already have no guarantee of getting valid HTML in the end. (Unless you are using the "safe" option and discarding HTML.) Note that this approach should still give you valid HTML for any correct input. - yuri |
From: Blake W. <bw...@la...> - 2008-02-19 15:29:24
|
Yuri Takhteyev wrote: >> Of course, both should work, so we may need a new approach to >> the inlinePatterns. Any ideas? > What I tried at the time > was storing a sting which uses a special Unicode character to mark the > positions where the nodes are supposed to be included. I.e., if "⊙" > is the special character, we could store something like: > ["A **⊙** currently does not work.", <link>] If your Unicode "character" were instead "%s", you could put the doms in a list, and repeatedly string-interpolate them... i.e. you would end up with (using pretend dom syntax): values = ("A %s currently does not work", ((dom("b","%s")), (dom("a", {'href':'index.html'}, "foo")) )) and then you could loop through, doing something like: template = values[0] substitutions = values[1] for subs in substitutions: template %= subs and, as long as your %s were escaped the correct number of times, you should be good to go. (If you went down that road, I might suggest using a dictionary, so that it was easier to see what was going on. The data in that case (if you just used strings) would look more like: values = ("A %(bold)s currently does not work", ({'bold':"<b>%(code)s</b>"), {'code':"<a href=`index.html`>foo</a>"} )) Where each processor got to choose its own namespace. > This would allow us to run REs (if we are careful) and still get the > dom tree in the end. Hmmm... Thinking about that a little started me wondering... If you end up with stuff in the wrong order it still wouldn't work. Unless you ran the inline parsers on the data of each substitution, which is probably a good idea, come to think of it. (And then override that method in the CodeProcessor to not call Markdown on its internal data.) > Another possibility is to only use dom trees for high-level elements > (lists, code blocks, quotes, etc), and do reduce inline patterns to > simple REs (each run on one element of the larger tree at a time). The nice property that you lose here is that you can't guarantee you'll always generate valid html/xml. Of course, you might not care about that, since Markdown will include any old stuff from the user, but if you cared, using dom trees gives you that guarantee. > I don't have time at the moment for such a major overhaul (this would > basically be Python-Markdown 2.0), but if someone else does then I > think this is the way to go. I am also pretty sure that this would > give us a sizeable performance boost. You've got to love performance boosts. :) Later, Blake. |
From: Yuri T. <qar...@gm...> - 2008-02-19 08:07:20
|
> The easy solution is to reverse the order of the inlinePatterns. But > then we can't do the first example as the link syntax is broken up in > the same way. Now, if no one ever uses that syntax, that would be > fine. Of course, both should work, so we may need a new approach to > the inlinePatterns. Any ideas? I've thought about this issue before and I think there are basically two solutions (apart from the zeroth solution of just dealing with it). One, somewhat tricky, would be implement some kind of data structure contains a mixture of strings and dom nodes and works with RE. It's not impossible, and I got half-way there implementing it in 2006, but then didn't have time to finish. What I tried at the time was storing a sting which uses a special Unicode character to mark the positions where the nodes are supposed to be included. I.e., if "⊙" is the special character, we could store something like: ["A **⊙** currently does not work.", <link>] This would allow us to run REs (if we are careful) and still get the dom tree in the end. Another possibility is to only use dom trees for high-level elements (lists, code blocks, quotes, etc), and do reduce inline patterns to simple REs (each run on one element of the larger tree at a time). The second solution would break some old extensions, but I think it's overall simpler and better. To give credit where credit is due, this is basically Ben Wilson's suggestion from last summer: https://sourceforge.net/mailarchive/forum.php?thread_name=cc6097050704100456x4daa81f0i9ca0137b6c484ba4%40mail.gmail.com&forum_name=python-markdown-discuss I don't have time at the moment for such a major overhaul (this would basically be Python-Markdown 2.0), but if someone else does then I think this is the way to go. I am also pretty sure that this would give us a sizeable performance boost. - yuri |
From: Waylan L. <wa...@gm...> - 2008-02-18 23:58:56
|
We have a few bugs in our tracker that highlight a limitation of the inlinePatterns. I'd like some feedback about which behavior is preferred or if we should look for a different way of doing things. Currently, it's only possible for one of the following to work in python-markdown: A bold [**link**](http://example.com) currently works fine. A **bold [link](http://example.com)** currently does not work. For those that care, here's why: Markdown parses the first line and finds the link. The label `**link**` is then run through all remaining inlinePatterns and properly identified as bold text. The second line is parsed and, as the link pattern is run first, the link is found and a link element is created. That line of text is now represented as the following list in python: ["A **bold ", <markdown.element>, "** currently does not work.\n"] Any remaining patterns are then run against each string in that list. The problem should be obvious by now. The opening and closing `**` are in separate strings split by the link element, so no match is made. Finally, the list is looped through and any remaining strings are converted to textnodes and the entire thing is added to the dom inside a paragraph element. The easy solution is to reverse the order of the inlinePatterns. But then we can't do the first example as the link syntax is broken up in the same way. Now, if no one ever uses that syntax, that would be fine. Of course, both should work, so we may need a new approach to the inlinePatterns. Any ideas? -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-02-18 02:00:35
|
Python-Markdown 1.7 final is now available for download [1]. There were only a few minor bug fixes since release candidate 1 was released in December. However, you may still want to upgrade. [1]: https://sourceforge.net/project/showfiles.php?group_id=153041 -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-02-17 23:53:56
|
On Feb 16, 2008 11:05 PM, Blake Winton <bw...@la...> wrote: > Waylan Limberg wrote: > > Blake, this is an interesting approach I hadn't thought of and I like it . > > Thanks! > > > When we run through the unescape code, > > how do we know that it was an escaped character, not something > > intentionally typed by the author that should be left alone? > > Or is that why you also escape the `&`? > > Yup, that's exactly why I escape the &. > > > Either way, I was never particularly fond of the way markdown > > currently handles escaping. I think your solution is much more elegant > > and I'd prefer to use something very much like it. > > > > However, I'm thinking that one of two things needs to happen: > > > > 1. The escaping needs to happen at the inline pattern level (or at > > least right before/after them, perhaps in _handleInline), rather than > > on the entire document. That eliminates the need to worry about > > messing with code blocks and raw html. > > Well, I think that escapes can happen (almost) anywhere, not just in > inline patterns. (Unless inline patterns handle everything except code > blocks...) Reading the documentation again, I'm not entirely sure why I > believe that. Perhaps confirmation from Mr. Gruber is in order. Something to keep in mind here. Python-markdown is not coded against markdown.pl, but rather Mr. Gruber's syntax rules. The internal implementation can be whatever we want it to be as long as the end result gets us the expected output. Mr. Gruber himself has said that there are some situations where markdown.pl is actually wrong and should not be looked to as an example. Anyway, yes in Python-Markdown inlinePatterns never see code blocks. > > Either way, though, I think that the other factor that kind of renders > this moot is that any characters that get escaped will get unescaped on > the way out, so it's as if we didn't do anything to them at all. We're > just hiding them from the internal processors. And the internal > processors can even pay attention to them if they want, since our > mapping is consistent. (If you see "&#x[0-9a-zA-Z]+;", you know it's an > escaped character. If you see "&", you know it was an &.) Yeah, thats clear to me now. if the author typed "&" it'd escape to "&#x26" and so on. I don't see any reason why we can't go with option 2 then. > > > 2. Assuming the above concern is moot, the escaping and unescaping > > should be pre and post processors respectively. Check the doc strings > > on the latest in svn (see below) for TextPreprocessor (line 413) and > > TextPostprocessor (line 945) which run on the entire document as a > > single string rather than split lines. > > Yeah, I could see that. Were the pre and post processors in 1.6 and I > just missed them? That depends on whether you have 1.6, 1.6a or 1.6b. Things kind of got messed up with that release. Some code in that release never made it into svn and some code in svn never made it into that release. 1.6b is the only one I recommend people use. Unfortunately, I'm not sure it's the one available with easy_install. Regardless, the TextPostprocessor is brand new. In fact, the code that removes and restores raw html are now text pre/post processors, so we can easily avoid escaping on raw html if we use them. > As a side note, I'm running against 1.6, because that > was what got installed when I typed "easy_install markdown". :) Either tonight to tomorrow I should be releasing 1.7 (currently at a release candidate - so I don't expect your patch to make it in -- sorry) > > > I appreciate all your hard work on this. However, it would be helpful > > if your diffs were against the latest in svn. There's been quite a few > > changes since 1.6. > > Fair enough. I'll see when I next have some free time, and give that a try. > > > You can checkout from here: > > svn co https://python-markdown.svn.sourceforge.net/svnroot/python-markdown > > python-markdown > > Which includes the test suite so you can check whether your breaking > > anything else. > > Ah, I was wondering where that was... Okay, I've got a copy in my > python 2.5 installation, and all the tests pass, so I should be good to > go with my modifications, I believe. > > Thanks, > Blake. > > -- ---- Waylan Limberg wa...@gm... |
From: Blake W. <bw...@la...> - 2008-02-17 04:19:32
|
Blake Winton wrote: > Waylan Limberg wrote: >> 1. The escaping needs to happen at the inline pattern level (or at >> least right before/after them, perhaps in _handleInline), rather than >> on the entire document. That eliminates the need to worry about >> messing with code blocks and raw html. > > Well, I think that escapes can happen (almost) anywhere, not just in > inline patterns. (Unless inline patterns handle everything except code > blocks...) Reading the documentation again, I'm not entirely sure why I > believe that. Perhaps confirmation from Mr. Gruber is in order. And searching the web, I see http://www.koders.com/noncode/fidB157BEC7D3FDEBB090E6CB9A1C01C4C4DD5DD434.aspx which says: 1.0.1 (14 Dec 2004): + Changed the syntax rules for code blocks and spans. Previously, backslash escapes for special Markdown characters were processed everywhere other than within inline HTML tags. Now, the contents of code blocks and spans are no longer processed for backslash escapes. This means that code blocks and spans are now treated literally, with no special rules to worry about regarding backslashes. **NOTE**: This changes the syntax from all previous versions of Markdown. Code blocks and spans involving backslash characters will now generate different output than before. So I guess escapes are only triggered within, uh, anything which isn't a code block or code span or inline html. Hmm, that's going to be ugly. Perhaps I'll just escape them all, and then in those three cases, handle the unescapes. Bleah. Is there a parent class for stuff which isn't a code block, code span, or inline html, by any chance? :) Thanks, Blake. |
From: Blake W. <bw...@la...> - 2008-02-17 04:05:58
|
Waylan Limberg wrote: > Blake, this is an interesting approach I hadn't thought of and I like it . Thanks! > When we run through the unescape code, > how do we know that it was an escaped character, not something > intentionally typed by the author that should be left alone? > Or is that why you also escape the `&`? Yup, that's exactly why I escape the &. > Either way, I was never particularly fond of the way markdown > currently handles escaping. I think your solution is much more elegant > and I'd prefer to use something very much like it. > > However, I'm thinking that one of two things needs to happen: > > 1. The escaping needs to happen at the inline pattern level (or at > least right before/after them, perhaps in _handleInline), rather than > on the entire document. That eliminates the need to worry about > messing with code blocks and raw html. Well, I think that escapes can happen (almost) anywhere, not just in inline patterns. (Unless inline patterns handle everything except code blocks...) Reading the documentation again, I'm not entirely sure why I believe that. Perhaps confirmation from Mr. Gruber is in order. Either way, though, I think that the other factor that kind of renders this moot is that any characters that get escaped will get unescaped on the way out, so it's as if we didn't do anything to them at all. We're just hiding them from the internal processors. And the internal processors can even pay attention to them if they want, since our mapping is consistent. (If you see "&#x[0-9a-zA-Z]+;", you know it's an escaped character. If you see "&", you know it was an &.) > 2. Assuming the above concern is moot, the escaping and unescaping > should be pre and post processors respectively. Check the doc strings > on the latest in svn (see below) for TextPreprocessor (line 413) and > TextPostprocessor (line 945) which run on the entire document as a > single string rather than split lines. Yeah, I could see that. Were the pre and post processors in 1.6 and I just missed them? As a side note, I'm running against 1.6, because that was what got installed when I typed "easy_install markdown". :) > I appreciate all your hard work on this. However, it would be helpful > if your diffs were against the latest in svn. There's been quite a few > changes since 1.6. Fair enough. I'll see when I next have some free time, and give that a try. > You can checkout from here: > svn co https://python-markdown.svn.sourceforge.net/svnroot/python-markdown > python-markdown > Which includes the test suite so you can check whether your breaking > anything else. Ah, I was wondering where that was... Okay, I've got a copy in my python 2.5 installation, and all the tests pass, so I should be good to go with my modifications, I believe. Thanks, Blake. |
From: Waylan L. <wa...@gm...> - 2008-02-17 02:59:40
|
Blake, this is an interesting approach I hadn't thought of and I like it . However, what about code blocks and spans in which the author had included an escape sequence? When we run through the unescape code, how do we know that it was an escaped character, not something intentionally typed by the author that should be left alone? For example: The escape sequence for a backslash (`\\`) is: `\`. Or is that why you also escape the `&`? Either way, I was never particularly fond of the way markdown currently handles escaping. I think your solution is much more elegant and I'd prefer to use something very much like it. However, I'm thinking that one of two things needs to happen: 1. The escaping needs to happen at the inline pattern level (or at least right before/after them, perhaps in _handleInline), rather than on the entire document. That eliminates the need to worry about messing with code blocks and raw html. 2. Assuming the above concern is moot, the escaping and unescaping should be pre and post processors respectively. Check the doc strings on the latest in svn (see below) for TextPreprocessor (line 413) and TextPostprocessor (line 945) which run on the entire document as a single string rather than split lines. I appreciate all your hard work on this. However, it would be helpful if your diffs were against the latest in svn. There's been quite a few changes since 1.6. You can checkout from here: svn co https://python-markdown.svn.sourceforge.net/svnroot/python-markdown python-markdown Which includes the test suite so you can check whether your breaking anything else. Or if you don't have access to subversion, at least the file markdown.py is available here: http://python-markdown.svn.sourceforge.net/viewvc/python-markdown/markdown.py?view=markup -- ---- Waylan Limberg wa...@gm... |
From: Blake W. <bw...@la...> - 2008-02-17 00:46:04
|
Blake Winton wrote: > Hey, what if the escape character turned the following character into > its hex-escape, as a pre-transformation? Something along these lines: > In [4]: def hexesc(m): > ...: return "&#x%x;" % ord(m.group(1)) > ...: > > In [5]: re.sub( r"\\(.)", hexesc, "abc \\ def" ) > Out[5]: 'abc   def' > > and then take that string, and run it through the patterns? Well, I started in on this, and I think I've got something that's at least proof-of-concept material... In [2]: print markdown.markdown( r"\``\`abc\``\`d&ef " ).strip() <p>`<code>`abc`</code>`d&ef  </p> I'm sure there are bugs, (because it's software ;) and the duplication of the unescape method is a little ugly, but hopefully someone more in tune with the codebase can take the patch and make it pretty and more correct. (One of the bugs I found before I sent this was that bare &s weren't getting translated into & Fortunately, it was easy enough to fix.) For those who are interested, here's the explanation of the patch, hunk by hunk: ---------------------- @@ -220,6 +220,10 @@ attrRegExp = re.compile(r'\{@([^\}]*)=([^\}]*)}') # {@id=123} def __init__ (self, text) : + def hexunesc(m): + return "%c" % chr(int(m.group(0)[3:-1],16)) + unescapeChars = r"&#x[0-9A-Fa-f]+;" + text = re.sub( unescapeChars, hexunesc, text ) self.value = text def attributeCallback(self, match) : ---------------------- This is TextNode's init, where it is being passed escaped characters, and so it unescapes them. ---------------------- @@ -488,8 +492,8 @@ Also note that all the regular expressions used by inline must capture the whole block. For this reason, they all start with -'^(.*)' and end with '(.*)!'. In case with built-in expression -Pattern takes care of adding the "^(.*)" and "(.*)!". +'^(.*)' and end with '(.*)$'. In case with built-in expression +Pattern takes care of adding the "^(.*)" and "(.*)$". Finally, the order in which regular expressions are applied is very important - e.g. if we first replace http://.../ links with <a> tags ---------------------- Just some typo fixes I thought I'ld throw in there. ---------------------- @@ -518,9 +522,8 @@ + (NOBRACKET+ r'\])*'+NOBRACKET)*6 + NOBRACKET + r')\]' ) -BACKTICK_RE = r'\`([^\`]*)\`' # `e= m*c^2` -DOUBLE_BACKTICK_RE = r'\`\`(.*)\`\`' # ``e=f("`")`` -ESCAPE_RE = r'\\(.)' # \< +BACKTICK_RE = r'`([^`]*)`' # `e= m*c^2` +DOUBLE_BACKTICK_RE = r'``(.*?)``' # ``e=f("`")`` EMPHASIS_RE = r'\*([^\*]*)\*' # *emphasis* STRONG_RE = r'\*\*(.*)\*\*' # **strong** STRONG_EM_RE = r'\*\*\*([^_]*)\*\*\*' # ***strong*** ---------------------- Since we are handling escapes at a different level, we don't need the regex for them anymore. Also, ` isn't a special character in regexes, so we don't need to \-escape it. Finally, I've made the double-backtick regular expression non-greedy (by adding the ? in the (.*?), since I think that ``abc`` def ``ghi`` should probably be two code blocks. ---------------------- @@ -540,16 +543,16 @@ IMAGE_REFERENCE_RE = r'\!' + BRK + '\s*\[([^\]]*)\]' # ![alt text][2] NOT_STRONG_RE = r'( \* )' # stand-alone * or _ AUTOLINK_RE = r'<(http://[^>]*)>' # <http://www.123.com> -AUTOMAIL_RE = r'<([^> \!]*@[^> ]*)>' # <me...@ex...> -#HTML_RE = r'(\<[^\>]*\>)' # <...> +AUTOMAIL_RE = r'<([^> \!]*@[^> ]*)>' # <me...@ex...> +#HTML_RE = r'(\<[^\>]*\>)' # <...> HTML_RE = r'(\<[a-zA-Z/][^\>]*\>)' # <...> -ENTITY_RE = r'(&[\#a-zA-Z0-9]*;)' # & +ENTITY_RE = r'(&[\#a-zA-Z0-9]*;)' # & class Pattern: def __init__ (self, pattern) : self.pattern = pattern - self.compiled_re = re.compile("^(.*)%s(.*)$" % pattern, re.DOTALL) + self.compiled_re = re.compile("^(.*?)%s(.*?)$" % pattern, re.DOTALL) def getCompiledRegExp (self) : return self.compiled_re ---------------------- Fix the spacing on the comments for AUTOLINK and #HTML. Also, since we're escaping things by turning them into F4; we also need to make sure that pattern can't be entered by the user, so we escape any &s into &, which means that entities entered by the user will be (in escaped-form) &amp; ---------------------- @@ -700,7 +703,6 @@ el.setAttribute('href', mailto) return el -ESCAPE_PATTERN = SimpleTextPattern(ESCAPE_RE) NOT_STRONG_PATTERN = SimpleTextPattern(NOT_STRONG_RE) BACKTICK_PATTERN = BacktickPattern(BACKTICK_RE) ---------------------- We don't need the escape pattern, as mentioned above. ---------------------- @@ -767,6 +769,10 @@ @param html: an html segment @returns : a placeholder string """ + def hexunesc(m): + return "%c" % chr(int(m.group(0)[3:-1],16)) + unescapeChars = r"&#x[0-9A-Fa-f]+;" + html = re.sub( unescapeChars, hexunesc, html ) self.rawHtmlBlocks.append(html) placeholder = HTML_PLACEHOLDER % self.html_counter self.html_counter += 1 ---------------------- So, in the HtmlStash, we need to unescape things before we stash them, because they won't be part of TextNodes (which is the other place that does the unescaping). ---------------------- @@ -946,8 +952,7 @@ self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, BACKTICK_PATTERN, - ESCAPE_PATTERN, - IMAGE_LINK_PATTERN, + IMAGE_LINK_PATTERN, IMAGE_REFERENCE_PATTERN, REFERENCE_PATTERN, LINK_ANGLED_PATTERN, ---------------------- We don't need the escape pattern. And fix the spacing on the IMAGE_LINK while we're here. ---------------------- @@ -1043,6 +1048,14 @@ # Split into lines and run the preprocessors that will work with # self.lines + def hexesc(m): + if m.group(1): + return "&#x%x;" % ord(m.group(1)) + else: + return "&#x%x;" % ord('&') + + escapeChars = r"\\([\\`*_{}\[\]()#+.!-])|&" + text = re.sub( escapeChars, hexesc, text ) self.lines = text.split("\n") # Run the pre-processors on the lines ---------------------- And finally, in _transform, before we split into lines, we should translate the various escape characters into their escaped form. The regex limits the escape characters to the ones listed in http://daringfireball.net/projects/markdown/syntax#backslash specifically: \ backslash ` backtick * asterisk _ underscore {} curly braces [] square brackets () parentheses # hash mark + plus sign - minus sign (hyphen) . dot ! exclamation mark and we also escape & (at the end of the regex), for reasons mentioned above. Uh, thanks for reading this far. I hope it all made sense. Please let me know if you found it helpful, or if I was totally wasting my time. ;) Later, Blake. |
From: Blake W. <bw...@la...> - 2008-02-16 04:04:21
|
Waylan Limberg wrote: > Thanks for the report. However, this makes it imposable to represent > the string of an escapes backtick "\`" in a code span. Argh. :) >>>> s = "`code with a \` in it`" > A double backtick won't work either. >>>> s = "`` code with a \` in it``" > This highlights the problem. Okay, I've gotten a little closer with the attached patch. In [7]: print markdown.markdown( "`code with a \` in it`" ).strip() <p><code>code with a ` in it</code> </p> In [8]: print markdown.markdown( "`` code with a \` in it``" ).strip() <p><code>code with a ` in it</code> </p> In [9]: print markdown.markdown( "`` code with a \`` in it``" ).strip() <p><code>code with a `` in it</code> </p> > The string is being broken into two > strings with a textnode containing the escaped backtick between. We > can't run a pattern across two strings. The patterns code would need > to be completely rewritten to fix that. I think that way would lead to madness. It seems like it might be simpler if the escaping mechanism were just a simple function, that you could call from a lot of places, many times per line. > Hmm, while typing this it occurred to me that we should be able to > escape the escape character. I believe that presently, this would also > create that textnode between two strings though. Maybe the the escape > regex could be reworked. I'll see what I can do. I don't think it really fits in as a pattern, since it can happen multiple times per line, and in the middle of other patterns. Perhaps it should be on a lower level, i.e. in createTextNode, or something? The worst part of that, I believe, is that it would mess up the parsing of all the other patterns, since they'ld have to avoid breaking on, say, \\` for the code case. Hey, what if the escape character turned the following character into its hex-escape, as a pre-transformation? Something along these lines: In [4]: def hexesc(m): ...: return "&#x%x;" % ord(m.group(1)) ...: In [5]: re.sub( r"\\(.)", hexesc, "abc \\ def" ) Out[5]: 'abc   def' and then take that string, and run it through the patterns? Later, Blake. |
From: Waylan L. <wa...@gm...> - 2008-02-16 00:07:16
|
Thanks for the report. However, this makes it imposable to represent the string of an escapes backtick "\`" in a code span. >>> import markdown >>> s = "`code with a \` in it`" >>> markdown(s) <p>`code with a ` in it\n<p> A double backtick won't work either. And yes, I realize that the doublebacktick should eliminate the need to escape, but what about when we are trying to represent markdown source in our document? >>> s = "`` code with a \` in it``" <p><code></code>code with a ` in it<code></code>\n</p> This highlights the problem. The string is being broken into two strings with a textnode containing the escaped backtick between. We can't run a pattern across two strings. The patterns code would need to be completely rewritten to fix that. Hmm, while typing this it occurred to me that we should be able to escape the escape character. I believe that presently, this would also create that textnode between two strings though. Maybe the the escape regex could be reworked. I'll see what I can do. On Feb 15, 2008 5:30 PM, Blake Winton <bw...@la...> wrote: > (Apologies if any of you get this twice, I had a hiccup with my mail server.) > (Uh, third time's a charm?) > > Looking at http://daringfireball.net/projects/markdown/syntax.php#backslash > Markdown provides backslash escapes for the following characters: > [...] > ` backtick > > And yet, when I try it in (using markdown-1.6-py2.4.egg) > > ----------- > >>> from markdown import markdown > >>> x = "\\`This should not be in code.\\`" > >>> print x > \`This should not be in code.\` > >>> print markdown( x ) > > <p>\<code>This should not be in code.\</code> > </p> > > > > ----------- > > So I edit the code a little, as so (also attached): > ----------- > === modified file 'markdown.py' > --- markdown.py 2008-02-15 21:28:18 +0000 > +++ markdown.py 2008-02-15 21:28:30 +0000 > @@ -944,9 +944,9 @@ > self.prePatterns = [] > > > - self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, > + self.inlinePatterns = [ ESCAPE_PATTERN, > + DOUBLE_BACKTICK_PATTERN, > BACKTICK_PATTERN, > - ESCAPE_PATTERN, > IMAGE_LINK_PATTERN, > IMAGE_REFERENCE_PATTERN, > REFERENCE_PATTERN, > ----------- > > and that gives me: > ----------- > >>> from markdown import markdown > >>> x = "\\`This should not be in code.\\`" > >>> print x > \`This should not be in code.\` > >>> > >>> print markdown( x ) > > <p>`This should not be in code.` > </p> > > > > ----------- > > Which I believe is closer to the spec. > > Thanks, > Blake. > > > > === modified file 'markdown.py' > --- markdown.py 2008-02-15 21:28:18 +0000 > +++ markdown.py 2008-02-15 21:28:30 +0000 > @@ -944,9 +944,9 @@ > self.prePatterns = [] > > > - self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, > + self.inlinePatterns = [ ESCAPE_PATTERN, > + DOUBLE_BACKTICK_PATTERN, > BACKTICK_PATTERN, > - ESCAPE_PATTERN, > IMAGE_LINK_PATTERN, > IMAGE_REFERENCE_PATTERN, > REFERENCE_PATTERN, > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-02-15 22:35:07
|
Thanks! On Fri, Feb 15, 2008 at 2:30 PM, Blake Winton <bw...@la...> wrote: > (Apologies if any of you get this twice, I had a hiccup with my mail server.) > (Uh, third time's a charm?) > > Looking at http://daringfireball.net/projects/markdown/syntax.php#backslash > Markdown provides backslash escapes for the following characters: > [...] > ` backtick > > And yet, when I try it in (using markdown-1.6-py2.4.egg) > > ----------- > >>> from markdown import markdown > >>> x = "\\`This should not be in code.\\`" > >>> print x > \`This should not be in code.\` > >>> print markdown( x ) > > <p>\<code>This should not be in code.\</code> > </p> > > > > ----------- > > So I edit the code a little, as so (also attached): > ----------- > === modified file 'markdown.py' > --- markdown.py 2008-02-15 21:28:18 +0000 > +++ markdown.py 2008-02-15 21:28:30 +0000 > @@ -944,9 +944,9 @@ > self.prePatterns = [] > > > - self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, > + self.inlinePatterns = [ ESCAPE_PATTERN, > + DOUBLE_BACKTICK_PATTERN, > BACKTICK_PATTERN, > - ESCAPE_PATTERN, > IMAGE_LINK_PATTERN, > IMAGE_REFERENCE_PATTERN, > REFERENCE_PATTERN, > ----------- > > and that gives me: > ----------- > >>> from markdown import markdown > >>> x = "\\`This should not be in code.\\`" > >>> print x > \`This should not be in code.\` > >>> > >>> print markdown( x ) > > <p>`This should not be in code.` > </p> > > > > ----------- > > Which I believe is closer to the spec. > > Thanks, > Blake. > > > > === modified file 'markdown.py' > --- markdown.py 2008-02-15 21:28:18 +0000 > +++ markdown.py 2008-02-15 21:28:30 +0000 > @@ -944,9 +944,9 @@ > self.prePatterns = [] > > > - self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, > + self.inlinePatterns = [ ESCAPE_PATTERN, > + DOUBLE_BACKTICK_PATTERN, > BACKTICK_PATTERN, > - ESCAPE_PATTERN, > IMAGE_LINK_PATTERN, > IMAGE_REFERENCE_PATTERN, > REFERENCE_PATTERN, |
From: Blake W. <bw...@la...> - 2008-02-15 22:30:15
|
(Apologies if any of you get this twice, I had a hiccup with my mail server.) (Uh, third time's a charm?) Looking at http://daringfireball.net/projects/markdown/syntax.php#backslash Markdown provides backslash escapes for the following characters: [...] ` backtick And yet, when I try it in (using markdown-1.6-py2.4.egg) ----------- >>> from markdown import markdown >>> x = "\\`This should not be in code.\\`" >>> print x \`This should not be in code.\` >>> print markdown( x ) <p>\<code>This should not be in code.\</code> </p> ----------- So I edit the code a little, as so (also attached): ----------- === modified file 'markdown.py' --- markdown.py 2008-02-15 21:28:18 +0000 +++ markdown.py 2008-02-15 21:28:30 +0000 @@ -944,9 +944,9 @@ self.prePatterns = [] - self.inlinePatterns = [ DOUBLE_BACKTICK_PATTERN, + self.inlinePatterns = [ ESCAPE_PATTERN, + DOUBLE_BACKTICK_PATTERN, BACKTICK_PATTERN, - ESCAPE_PATTERN, IMAGE_LINK_PATTERN, IMAGE_REFERENCE_PATTERN, REFERENCE_PATTERN, ----------- and that gives me: ----------- >>> from markdown import markdown >>> x = "\\`This should not be in code.\\`" >>> print x \`This should not be in code.\` >>> >>> print markdown( x ) <p>`This should not be in code.` </p> ----------- Which I believe is closer to the spec. Thanks, Blake. |
From: Trent M. <tr...@gm...> - 2008-02-07 18:58:54
|
> On Feb 4, 2008 7:50 PM, Blake Winton <bw...@la...> wrote: > > Hi, > > > > uh, you don't know me, but I'm investigating switching a home-grown wiki > > parser over to use Markdown syntax instead. Since everything else is > > written in Python, I'm mainly considering python-markdown and > > python-markdown2. I think you guys have the edge in terms of my being > > able to add wiki links, and generically extend it, > > I'd say you nailed it. That about sums up the difference. With > python-markdown2, you'd need to hack the core to add or change > behavior. Personally, I'd much rather use python-markdown's extension > api. In fact if you run into a few dead ends, let us know. We may be > able to improve the api to make more things possible. Blake, I missed the start of this thread, so I may be mistaken at what you are trying to do, but markdown2.py has a "link-patterns" feature that you might be able to use for auto-linking WikiWords (I'm not sure if that was the wiki syntax you are using): >>> import re, markdown2 >>> link_patterns = [ ... # Match a wiki page link LikeThis. ... (re.compile(r"(\b[A-Z][a-z]+[A-Z]\w+\b)"), r"/\1") ... ] >>> processor = markdown2.Markdown(extras=["link-patterns"], ... link_patterns=link_patterns) >>> wiki_page = """ ... # This is my WikiPage! ... ... This is AnotherPage and YetAnotherPage. ... """ >>> print processor.convert(wiki_page) <h1>This is my <a href="/WikiPage">WikiPage</a>!</h1> <p>This is <a href="/AnotherPage">AnotherPage</a> and <a href="/YetAnotherPage"> YetAnotherPage</a>.</p> More details here: http://code.google.com/p/python-markdown2/wiki/LinkPatterns Cheers, Trent -- Trent Mick tr...@gm... |
From: Yuri T. <qar...@gm...> - 2008-02-07 03:15:58
|
On Mon, Feb 4, 2008 at 8:37 PM, Waylan Limberg <wa...@gm...> wrote: > I'd say you nailed it. That about sums up the difference. With > python-markdown2, you'd need to hack the core to add or change > behavior. Personally, I'd much rather use python-markdown's extension > api. In fact if you run into a few dead ends, let us know. We may be > able to improve the api to make more things possible. Hey, let a hundred flowers bloom... I haven't been following python-markdown2 development closely enough to say how much of an effort it would be to hack it. But Waylan is right that python-markdown was designed with the assumption that people would want something-but-not-quite like the standard behavior and that extensibility and readability is more important than 100% compliance with the Perl implementation or performance. I've tried to keep both in the acceptable range (and Waylan really helped with this recently), but it hasn't been top priority. Maybe at some point someone would find time to combine the best of both worlds... > > but I was wondering > > if you had any suggestions of wikis that were using your project? http://infogami.org/ We've even almost moved the python-markdown wiki to infogami, but then this got stalled because I didn't have time to work out some minor issues. BTW, I am wondering if we should just add some a pattern for wikilinks, perhaps [[...]] into markdown.py so that one would be able to activate it with just a flag or something. > > definition term > > : definition description > > > > If you don't handle them, how hard do you think it would be to create a > > plugin that would handle them? Easy. You should do it! - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-02-05 14:20:03
|
On Feb 5, 2008 7:59 AM, Blake Winton <bw...@la...> wrote: > > >> Oh, and do you handle definition lists? The examples I saw were all of > >> the form: > >> definition term > >> : definition description > >> If you don't handle them, how hard do you think it would be to create a > >> plugin that would handle them? > > > > Unfortunately, that's the one thing I haven't tried to tackle yet. I > > find the whole 'description gets the markup (:), not the term' a > > little hard to wrap my mind around. I keep getting the feeling that > > we'll need to make some changes to the core to get that one to work, > > although I could be wrong. If you have any suggestions, I'm game. > > It seems to me, (after a total of 5 minutes of thinking about it ;) that > you could use a combination of pre- and post-processors. The > pre-processor would turn a line that preceeds a line that starts with a > colon (should be easy enough to find, just get lines[index-1]) into > something more recognizable for the parser, and the post processor could > turn those things into dt/dd sets. Actually, I think you could just > turn them into lines of the form "<dt>...</dt>" and "<dd>...</dd>". > Don't forget that you can associate more than one term with a definition. Second, python-markdown doesn't just substitute text for html. Rather the text is inserted into a dom object, which is later output as html when processing is complete. To do that, you need to be within the core (after preprocessors), although, I suppose you could set them aside and use a postprocessor for that. But, then what happens when a definition list is nested in an ordered or unordered list, or a blockquote? Or what if such blocklevel elements are nested within a definition? What we need is a easier way to insert additional blocklevel parsers into the core. This is something I've been chewing on for a while now and the reason I haven't tackled definition lists yet. I wanted to work out this addition extension mechanism (which I've tentatively named blockparsers) first. If anyone has any suggestions, I'm all ears. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-02-05 14:03:48
|
On Feb 5, 2008 3:20 AM, Herbert Poul <her...@gm...> wrote: > > although .. i don't know why the syntax should be weird ;) i actually > copied it from another wiki engine i've used in the java world > (radeox) > the nice thing about it is that it makes it very easy to add addtional > macros: http://yourhell.com/wsvn/root/django/communitytools/trunk/sphenecoll/sphene/sphwiki/wikimacros.py > .. without dealing with strings or anything .. > Sorry Herbert. Perhaps that was a poor choice of words. I just don't like macros in general. Nothing personal against yours. I like markdown because its easier to read and edit than html with little to no markup (the very reason M.G. called is mark*down*). IMO, macros take away from that and don't add any value that *I* want. In fact I don't even use the wikilinks extension, which I wrote and maintain. -- ---- Waylan Limberg wa...@gm... |
From: Blake W. <bw...@la...> - 2008-02-05 12:59:24
|
Waylan Limberg wrote: > [1]: http://code.google.com/p/monk-wiki/ > [2]: http://sourceforge.net/mailarchive/forum.php?forum_name=python-markdown-discuss I'll check these out (and the link that Herbert suggested). >> Oh, and do you handle definition lists? The examples I saw were all of >> the form: >> definition term >> : definition description >> If you don't handle them, how hard do you think it would be to create a >> plugin that would handle them? > > Unfortunately, that's the one thing I haven't tried to tackle yet. I > find the whole 'description gets the markup (:), not the term' a > little hard to wrap my mind around. I keep getting the feeling that > we'll need to make some changes to the core to get that one to work, > although I could be wrong. If you have any suggestions, I'm game. It seems to me, (after a total of 5 minutes of thinking about it ;) that you could use a combination of pre- and post-processors. The pre-processor would turn a line that preceeds a line that starts with a colon (should be easy enough to find, just get lines[index-1]) into something more recognizable for the parser, and the post processor could turn those things into dt/dd sets. Actually, I think you could just turn them into lines of the form "<dt>...</dt>" and "<dd>...</dd>". Thanks, Blake. |