From: Artem <ne...@gm...> - 2008-06-04 09:34:31
|
Hi all, So, I plan to do the following: - Some code refactoring, there are a lot of ideas about it in Greg Wilson's review, and I as well have some ideas. For instance, in current Markdown DOM implementation there are some differences from standard DOM libraries, for example usually in Element.replaceChild the first argument is newNode and the second is oldNode, in Markdown implementation first is oldNode and the second is newNode. Usually Element's parent property name is "parentNode", but here it's just "parent" etc. I realize that it'll break some extensions, but I think it'll help people in future to avoid reading code if they already know some DOM library. - There is something to do with Inline patterns, I didn't decide yet what is the best way to fix it. That was discussed in list and there are some ideas. I thought of writing syntax/lexical parser instead of current Inline Patterns mechanism, but I think it'll be a bit slower. - I'll try to boost performance, I think choosing the right way of inline patterns modification is the best way to boost Markdown. - An extension for Crunchy to load files using the Markdown syntax. - Test suite extension - Some additional documentation, maybe adding more examples about writing extension modules. Also I wrote to Django community asking if they need something special, but they said that nothing Django-specific, but API stability. Someone suggested adding markdown extras, but Waylan said that it was already almost done. Maybe while working I'll find some other ideas, of what else I can do. As I understand code must be compatible with python 2.3, 2.4, 2.5, isn't it? |
From: Waylan L. <wa...@gm...> - 2008-06-04 13:25:31
|
Hi Artem, On Wed, Jun 4, 2008 at 5:34 AM, Artem <ne...@gm...> wrote: > Hi all, > > So, I plan to do the following: > > - Some code refactoring, there are a lot of ideas about it in Greg > Wilson's review, and I as well have some ideas. > For instance, in current Markdown DOM implementation there are some > differences from standard DOM libraries, for example usually in > Element.replaceChild the first argument is newNode and the second is > oldNode, in Markdown implementation first is oldNode and the second is > newNode. Usually Element's parent property name is "parentNode", but > here it's just "parent" etc. > I realize that it'll break some extensions, but I think it'll help > people in future to avoid reading code if they already know some DOM > library. Don't worry to much about breaking old extensions. If you refactor the inline patterns (as noted below), extensions will need to be updated anyway. Just make sure your changes are actually useful, not just a new color for the bikeshed. > > - There is something to do with Inline patterns, I didn't decide yet > what is the best way to fix it. That was discussed in list and there are > some ideas. I thought of writing syntax/lexical parser instead of > current Inline Patterns mechanism, but I think it'll be a bit slower. > > - I'll try to boost performance, I think choosing the right way of > inline patterns modification is the best way to boost Markdown. I agree. I'd really encourage you to spend the most time on the inline pattern issue. That's where markdown.py needs the most help IMO. > > - An extension for Crunchy to load files using the Markdown syntax. > > - Test suite extension > > - Some additional documentation, maybe adding more examples about > writing extension modules. I've been meaning to do this myself, but your more than welcome to. > > Also I wrote to Django community asking if they need something special, > but they said that nothing Django-specific, but API stability. Someone > suggested adding markdown extras, but Waylan said that it was already > almost done. Just be sure to update any extensions in the repo that you break. Feel free to check with me on any of them as I wrote most of them. > > Maybe while working I'll find some other ideas, of what else I can do. > > As I understand code must be compatible with python 2.3, 2.4, 2.5, isn't > it? Yes, that is correct. -- ---- Waylan Limberg wa...@gm... |
From: Artem <ne...@gm...> - 2008-06-05 16:36:10
|
Waylan Limberg wrote: > Don't worry to much about breaking old extensions. If you refactor the > inline patterns (as noted below), extensions will need to be updated > anyway. Just make sure your changes are actually useful, not just a > new color for the bikeshed. > > Ok. >> - There is something to do with Inline patterns, I didn't decide yet >> what is the best way to fix it. That was discussed in list and there are >> some ideas. I thought of writing syntax/lexical parser instead of >> current Inline Patterns mechanism, but I think it'll be a bit slower. >> >> - I'll try to boost performance, I think choosing the right way of >> inline patterns modification is the best way to boost Markdown. >> > > I agree. I'd really encourage you to spend the most time on the inline > pattern issue. That's where markdown.py needs the most help IMO. > Yep, I already understood it. > Just be sure to update any extensions in the repo that you break. Feel > free to check with me on any of them as I wrote most of them. > Ok, thanks. |
From: Yuri T. <qar...@gm...> - 2008-06-04 17:55:04
|
> - Some code refactoring, there are a lot of ideas about it in Greg > Wilson's review, and I as well have some ideas. > For instance, in current Markdown DOM implementation there are some > differences from standard DOM libraries, for example usually in > Element.replaceChild the first argument is newNode and the second is > oldNode, in Markdown implementation first is oldNode and the second is > newNode. Usually Element's parent property name is "parentNode", but > here it's just "parent" etc. > I realize that it'll break some extensions, but I think it'll help > people in future to avoid reading code if they already know some DOM > library. It's ok to break backwards compatibility with extensions, if this buys us enough. Similarly, we can think of dropping Python 2.3 support - again if this really buys us something big. In particular, as far as the tree representation goes, we can consider a couple of things, in particular: 1. Stick with NanoDOM, fixing the problems you mentioned 2. Switch to ElementTree Let's discuss those options. (Or other.) (Artem: can you make a page for this project on the wiki, to keep track of the questions and the decisions, and then start separate threads for each question, though probably not all at the same time?) > - There is something to do with Inline patterns, I didn't decide yet > what is the best way to fix it. That was discussed in list and there are > some ideas. I thought of writing syntax/lexical parser instead of > current Inline Patterns mechanism, but I think it'll be a bit slower. I agree with Waylan, this should be the focus. I would avoid trying to do serious parsing. I think this part might be best done using straight regular expressions, and we might even be able to "steal" a lot of codes from Trent's markdown2. In other words, my suggestion is that the first round of parsing should turn the document into a tree of blocks, where nodes in the tree represent individual simple paragraphs, list items, block quotes, code segments, block level HTML elements, etc. The client will then ideally be able to get this tree back if they want. The second round of parsing would then simply go through this tree and run a different set of regular expressions on each node depending on the type of the node. If Python had a good PEG implementation, it perhaps would make sense to consider rewriting markdown using PEG. But at this point I think it's premature. > - I'll try to boost performance, I think choosing the right way of > inline patterns modification is the best way to boost Markdown. Yes, let's first do inline patterns, and see what this gives us. Another thing that could be done is avoiding excessive recursion in block parsing. But let's do one thing at a time. > - An extension for Crunchy to load files using the Markdown syntax. Sounds good, but don't get too sidetracked on this. Also, we need to make sure there is someone on Crunchy project who is actually interested in this and will make use of it. > - Test suite extension Yes, in fact, I would consider doing this _first_. I.e., it would be good to put together a unified test suite that combines all of our and Trent's tests and gives us a better idea of where the two implementations stand relative to each other. BTW, I would urge you to make sure that your modified version of python markdown passes all of our tests (and an increasing number of Trent's) at least once a week. Let's avoid the "Version 2" problem. > - Some additional documentation, maybe adding more examples about > writing extension modules. Let's put this off till later. If we'll be making serious changes, it makes more sense to work on the documentation after the work is done. But it would be good if you could at least document any changes that you end up making before the end of the summer. > Also I wrote to Django community asking if they need something special, > but they said that nothing Django-specific, but API stability. Someone > suggested adding markdown extras, but Waylan said that it was already > almost done. Yes, let's assume that the basic API will stay the same. > As I understand code must be compatible with python 2.3, 2.4, 2.5, isn't > it? We can reconsider this decision, if there is a good reason. We just shouldn't take it lightly. - yuri -- http://sputnik.freewisdom.org/ |
From: Sam's L. <sam...@gm...> - 2008-06-04 21:17:43
|
Hi... I hope no one minds me asking.... But I think perhaps you should at least consider freezing the old code---and having these changes target the upcoming Python 3. Perhaps also being compatible with the upcoming 2.6. Python has changed---in good ways---since 2.3 Who are we targeting by being compatible with 2.3? Thanks On Wed, Jun 4, 2008 at 10:55 AM, Yuri Takhteyev <qar...@gm...> wrote: > > As I understand code must be compatible with python 2.3, 2.4, 2.5, isn't > > it? > > We can reconsider this decision, if there is a good reason. We just > shouldn't take it lightly. > > > <http://sputnik.freewisdom.org/> > > |
From: Yuri T. <qar...@gm...> - 2008-06-04 21:50:44
|
> Python has changed---in good ways---since 2.3 Who are we targeting by being > compatible with 2.3? Anyone using python markdown on a shared hosting system that does not support the more recent versionsn of Python. For example, Dreamhost defaults to python 2.3 at this point and python 2.4 is the highest version that is available. If you want python2.5 on dreamhost, you'll need to build it yourself. So, python 2.3 support may be worth re-thinking at this point, but i don't think it's something to be taken lightly. Note that Django works with Python 2.3. We should definitely start testing the code with 2.6 once it's out. If someone wants to look into making sure that the code is compatible with Python 3000 (in the sense that it can be converted successfully via 2to3), then I am all for it and I will check in any patches that are needed to make it compatible. (Assuming it still works with the earlier versions too.) But I do not myself plan to look into Py3K at the moment, since I don't yet see much demand for that. - yuri -- http://sputnik.freewisdom.org/ |
From: Artem <ne...@gm...> - 2008-06-05 16:42:50
|
Yuri Takhteyev wrote: > It's ok to break backwards compatibility with extensions, if this buys > us enough. Similarly, we can think of dropping Python 2.3 support - > again if this really buys us something big. In particular, as far as > the tree representation goes, we can consider a couple of things, in > particular: > > 1. Stick with NanoDOM, fixing the problems you mentioned > 2. Switch to ElementTree > > Let's discuss those options. (Or other.) > I thought about ElementTree, but there is a problem with entities escaping and I I didn't find any beautiful solution yet. > (Artem: can you make a page for this project on the wiki, to keep > track of the questions and the decisions, and then start separate > threads for each question, though probably not all at the same time?) > > Sure. Here it is: http://www.freewisdom.org/projects/python-markdown/GSoC2008 > In other words, my suggestion is that the first round of parsing > should turn the document into a tree of blocks, where nodes in the > tree represent individual simple paragraphs, list items, block quotes, > code segments, block level HTML elements, etc. The client will then > ideally be able to get this tree back if they want. The second round > of parsing would then simply go through this tree and run a different > set of regular expressions on each node depending on the type of the > node. > > Thanks for the suggestion. Do you mean that the first round of parsing should be without regexps? > Yes, let's first do inline patterns, and see what this gives us. > Another thing that could be done is avoiding excessive recursion in > block parsing. But let's do one thing at a time Ok. > Sounds good, but don't get too sidetracked on this. Also, we need to > make sure there is someone on Crunchy project who is actually > interested in this and will make use of it. > > I think we should contact with André Roberge. >> - Test suite extension >> > > Yes, in fact, I would consider doing this _first_. I.e., it would be > good to put together a unified test suite that combines all of our and > Trent's tests and gives us a better idea of where the two > implementations stand relative to each other. > > BTW, I would urge you to make sure that your modified version of > python markdown passes all of our tests (and an increasing number of > Trent's) at least once a week. Let's avoid the "Version 2" problem. > > Got it. >> - Some additional documentation, maybe adding more examples about >> writing extension modules. >> > > Let's put this off till later. If we'll be making serious changes, it > makes more sense to work on the documentation after the work is done. > But it would be good if you could at least document any changes that > you end up making before the end of the summer. > Ok. |
From: Yuri T. <yu...@si...> - 2008-06-06 05:57:55
|
> I thought about ElementTree, but there is a problem with entities > escaping and I I didn't find any beautiful solution yet. Sure. I am not saying we should switch to ElementTree. Just that this might be worth considering at this point. Sticking with Nanodom and fixing the discrepancies from minidom API might be less work. > Thanks for the suggestion. Do you mean that the first round of parsing > should be without regexps? No, I didn't mean to say that we should avoid regexps in parsing - just that this wouldn't be a simple substitution. Avoiding regular expression would be neither necessary nor feasible. I would actually prefer that for now you keep the current parsing code as is for now. Right now Markdown class has a single _transform() method which takes markdown source and returns HTML. _transform() calls _processSection() on the source, which is itself recursive. I wouldn't mind getting rid of this recursion, but let's not worry about this right now. Instead, let's extricate _handleInline from all of this. That is, instead of the single call to _transform, I would rather have two functions: markdown_to_tree (markdown_source) - takes markdown, returns a Nanodom tree, WITHOUT applying inline patterns apply_inline_patterns (nanodom_tree) - takes a nano-dom tree and applies inline patterns to all nodes that need it (returning either the modified tree or a copy of it). So, one would be able to do conversion with: m = Markdown() return m.apply_inline_patterns(m.markdown_to_tree(my_source)).to_xml() Or maybe attach the second function as a method to NanoDom: return m.markdown_to_tree(my_source).apply_inline_patterns().to_xml() What this would gain us is two things. First we'll have better separation of code into two areas. I think this will make it easier to read and maintain. This will put us in a good position to change how inline patterns are handled. Second, this will give the caller more options: they can do stuff to the tree before applying inline patterns. They could also come up with their own way of handling inline patterns. - yuri -- http://sputnik.freewisdom.org/ |