From: Waylan L. <wa...@gm...> - 2010-02-22 18:04:48
|
On Mon, Feb 22, 2010 at 11:31 AM, Yuri Takhteyev <qar...@gm...> wrote: > What happens is that \] is turned into placeholder, which gives us > "<x\x02klzzwxh:0000\x03>". Then <x\x02klzzwxh:0000\x03> is picked up > as an HTML pattern and stashed away, which prevents the placeholder > from being replaced back with "]". > > I am guessing we should make HTML_RE more restrictive. Actually it is a little more complicated that that. We are removing raw html in two places. In a preprocessor we find and remove block level html (div, p, etc.). Then in the inlinepatterns we remove inline html (span, em, etc) - actually the inline pattern removes anything wrapped in <> that still remains in the document. Which means it has to run after the inlinepatterns for autolink and automail. However, for reasons which should be obvious, escaping needs to happen before autolink and automail. At the same time, escaping should not be run on raw html. I see to possible fixes: (1) Move inline raw html detection to the preprocessor. (2) Have the inline raw html pattern check for any placeholders and swap them out. Option 2 would certainly be the easiest to implement. But there is a part of me that wants to do option 1. A little while back I actually started a new preprocessor that used Python's builtin real html parser. It actually worked - until I included autolinks/automails in a document. That broken implementation is in a branch on Gitorious if anyone is interested. Unfortunately, I failed to commit when I had working code and don't remember what I did that broke it when trying (unsuccessfully) to work around the autolink issue. When I realized what I did, I just committed everything as is and figured I come back to it later. > - yuri > > On Mon, Feb 22, 2010 at 10:57 AM, Yuri Takhteyev <qar...@gm...> wrote: >> Interesting. Here is a much simpler test case triggering this: >> >> md.convert("<x\]>") >> >> Even without any extensions, using the version from git, we get: >> >> u'<x\x02klzzwxh:0000\x03>' >> >> - yuri >> >> On Mon, Feb 22, 2010 at 9:08 AM, Tom Ritter <to...@ri...> wrote: >>> klzzwxh:0000 >> > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- \X/ /-\ `/ |_ /-\ |\| Waylan Limberg |