|
From: Waylan L. <wa...@gm...> - 2010-02-22 18:04:48
|
On Mon, Feb 22, 2010 at 11:31 AM, Yuri Takhteyev <qar...@gm...> wrote:
> What happens is that \] is turned into placeholder, which gives us
> "<x\x02klzzwxh:0000\x03>". Then <x\x02klzzwxh:0000\x03> is picked up
> as an HTML pattern and stashed away, which prevents the placeholder
> from being replaced back with "]".
>
> I am guessing we should make HTML_RE more restrictive.
Actually it is a little more complicated that that. We are removing
raw html in two places. In a preprocessor we find and remove block
level html (div, p, etc.). Then in the inlinepatterns we remove inline
html (span, em, etc) - actually the inline pattern removes anything
wrapped in <> that still remains in the document. Which means it has
to run after the inlinepatterns for autolink and automail. However,
for reasons which should be obvious, escaping needs to happen before
autolink and automail. At the same time, escaping should not be run on
raw html.
I see to possible fixes:
(1) Move inline raw html detection to the preprocessor.
(2) Have the inline raw html pattern check for any placeholders and
swap them out.
Option 2 would certainly be the easiest to implement. But there is a
part of me that wants to do option 1. A little while back I actually
started a new preprocessor that used Python's builtin real html
parser. It actually worked - until I included autolinks/automails in a
document. That broken implementation is in a branch on Gitorious if
anyone is interested. Unfortunately, I failed to commit when I had
working code and don't remember what I did that broke it when trying
(unsuccessfully) to work around the autolink issue. When I realized
what I did, I just committed everything as is and figured I come back
to it later.
> - yuri
>
> On Mon, Feb 22, 2010 at 10:57 AM, Yuri Takhteyev <qar...@gm...> wrote:
>> Interesting. Here is a much simpler test case triggering this:
>>
>> md.convert("<x\]>")
>>
>> Even without any extensions, using the version from git, we get:
>>
>> u'<x\x02klzzwxh:0000\x03>'
>>
>> - yuri
>>
>> On Mon, Feb 22, 2010 at 9:08 AM, Tom Ritter <to...@ri...> wrote:
>>> klzzwxh:0000
>>
>
> ------------------------------------------------------------------------------
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss
>
--
----
\X/ /-\ `/ |_ /-\ |\|
Waylan Limberg
|