|
From: Yuri T. <qar...@gm...> - 2008-02-19 08:07:20
|
> The easy solution is to reverse the order of the inlinePatterns. But
> then we can't do the first example as the link syntax is broken up in
> the same way. Now, if no one ever uses that syntax, that would be
> fine. Of course, both should work, so we may need a new approach to
> the inlinePatterns. Any ideas?
I've thought about this issue before and I think there are basically
two solutions (apart from the zeroth solution of just dealing with
it). One, somewhat tricky, would be implement some kind of data
structure contains a mixture of strings and dom nodes and works with
RE. It's not impossible, and I got half-way there implementing it in
2006, but then didn't have time to finish. What I tried at the time
was storing a sting which uses a special Unicode character to mark the
positions where the nodes are supposed to be included. I.e., if "⊙"
is the special character, we could store something like:
["A **⊙** currently does not work.", <link>]
This would allow us to run REs (if we are careful) and still get the
dom tree in the end.
Another possibility is to only use dom trees for high-level elements
(lists, code blocks, quotes, etc), and do reduce inline patterns to
simple REs (each run on one element of the larger tree at a time).
The second solution would break some old extensions, but I think it's
overall simpler and better. To give credit where credit is due, this
is basically Ben Wilson's suggestion from last summer:
https://sourceforge.net/mailarchive/forum.php?thread_name=cc6097050704100456x4daa81f0i9ca0137b6c484ba4%40mail.gmail.com&forum_name=python-markdown-discuss
I don't have time at the moment for such a major overhaul (this would
basically be Python-Markdown 2.0), but if someone else does then I
think this is the way to go. I am also pretty sure that this would
give us a sizeable performance boost.
- yuri
|