From: Yuri T. <qar...@gm...> - 2008-02-19 08:07:20
|
> The easy solution is to reverse the order of the inlinePatterns. But > then we can't do the first example as the link syntax is broken up in > the same way. Now, if no one ever uses that syntax, that would be > fine. Of course, both should work, so we may need a new approach to > the inlinePatterns. Any ideas? I've thought about this issue before and I think there are basically two solutions (apart from the zeroth solution of just dealing with it). One, somewhat tricky, would be implement some kind of data structure contains a mixture of strings and dom nodes and works with RE. It's not impossible, and I got half-way there implementing it in 2006, but then didn't have time to finish. What I tried at the time was storing a sting which uses a special Unicode character to mark the positions where the nodes are supposed to be included. I.e., if "⊙" is the special character, we could store something like: ["A **⊙** currently does not work.", <link>] This would allow us to run REs (if we are careful) and still get the dom tree in the end. Another possibility is to only use dom trees for high-level elements (lists, code blocks, quotes, etc), and do reduce inline patterns to simple REs (each run on one element of the larger tree at a time). The second solution would break some old extensions, but I think it's overall simpler and better. To give credit where credit is due, this is basically Ben Wilson's suggestion from last summer: https://sourceforge.net/mailarchive/forum.php?thread_name=cc6097050704100456x4daa81f0i9ca0137b6c484ba4%40mail.gmail.com&forum_name=python-markdown-discuss I don't have time at the moment for such a major overhaul (this would basically be Python-Markdown 2.0), but if someone else does then I think this is the way to go. I am also pretty sure that this would give us a sizeable performance boost. - yuri |