Re: [Python-markdown-discuss] GSoC plan

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> - Some code refactoring, there are a lot of ideas about it in Greg
> Wilson's review, and I as well have some ideas.
> For instance, in current Markdown DOM implementation there are some
> differences from standard DOM libraries, for example usually in
> Element.replaceChild the first argument is newNode and the second is
> oldNode, in Markdown implementation first is oldNode and the second is
> newNode. Usually Element's parent property name is "parentNode", but
> here it's just "parent" etc.
> I realize that it'll break some extensions, but I think it'll help
> people in future to avoid reading code if they already know some DOM
> library.

It's ok to break backwards compatibility with extensions, if this buys
us enough.  Similarly, we can think of dropping Python 2.3 support -
again if this really buys us something big.  In particular, as far as
the tree representation goes, we can consider a couple of things, in
particular:

1. Stick with NanoDOM, fixing the problems you mentioned
2. Switch to ElementTree

Let's discuss those options.  (Or other.)

(Artem: can you make a page for this project on the wiki, to keep
track of the questions and the decisions, and then start separate
threads for each question, though probably not all at the same time?)

> - There is something to do with Inline patterns, I didn't decide yet
> what is the best way to fix it. That was discussed in list and there are
> some ideas. I thought of writing syntax/lexical parser instead of
> current Inline Patterns mechanism, but I think it'll be a bit  slower.

I agree with Waylan, this should be the focus.  I would avoid trying
to do serious parsing.  I think this part might be best done using
straight regular expressions, and we might even be able to "steal" a
lot of codes from Trent's markdown2.

In other words, my suggestion is that the first round of parsing
should turn the document into a tree of blocks, where nodes in the
tree represent individual simple paragraphs, list items, block quotes,
code segments, block level HTML elements, etc.  The client will then
ideally be able to get this tree back if they want.  The second round
of parsing would then simply go through this tree and run a different
set of regular expressions on each node depending on the type of the
node.

If Python had a good PEG implementation, it perhaps would make sense
to consider rewriting markdown using PEG.  But at this point I think
it's premature.

> - I'll try to boost performance, I think choosing the right way of
> inline patterns modification is the best way to boost Markdown.

Yes, let's first do inline patterns, and see what this gives us.
Another thing that could be done is avoiding excessive recursion in
block parsing.  But let's do one thing at a time.

> - An extension for Crunchy to load files using the Markdown syntax.

Sounds good, but don't get too sidetracked on this.  Also, we need to
make sure there is someone on Crunchy project who is actually
interested in this and will make use of it.

> - Test suite extension

Yes, in fact, I would consider doing this _first_.  I.e., it would be
good to put together a unified test suite that combines all of our and
Trent's tests and gives us a better idea of where the two
implementations stand relative to each other.

BTW, I would urge you to make sure that your modified version of
python markdown passes all of our tests (and an increasing number of
Trent's) at least once a week.  Let's avoid the "Version 2" problem.

> - Some additional documentation, maybe adding more examples about
> writing extension modules.

Let's put this off till later.  If we'll be making serious changes, it
makes more sense to work on the documentation after the work is done.
But it would be good if you could at least document any changes that
you end up making before the end of the summer.

> Also I wrote to Django community asking if they need something special,
> but they said that nothing Django-specific, but API stability.  Someone
> suggested adding markdown extras, but Waylan said that it was already
> almost done.

Yes, let's assume that the basic API will stay the same.

> As I understand code must be compatible with python 2.3, 2.4, 2.5, isn't
> it?

We can reconsider this decision, if there is a good reason.  We just
shouldn't take it lightly.

 - yuri

-- 
http://sputnik.freewisdom.org/