Re: [Python-markdown-discuss] GSoC progress

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Jul 3, 2008 at 8:09 PM, Yuri Takhteyev <qar...@gm...> wrote:
>> On 3-Jul-08, at 5:39 PM, Artem Yunusov wrote:
>>> As far as I understand all the HTML from input replacing by
>>> placeholders, and then inserting back only after serialization. So, it
>>> won't be a problem in this case.
>
> Yes, Artem is right, we are now not attempting to parse HTML submitted
> by the user, we just pass it through.  This is what most (all?)
> markdown implementations do.  This also means that if the user
> supplies bad HTML (or HTML that is not XHTML), then they will get back
> what they gave us.  Garbage in, garbage out.
>
> The consensus on the markdown list seems to have been that policing
> HTML submitted by the user (which would include looking out for XSS
> attacks) should be left to the client, who should filter the output of
> markdown.
>
>> For some reason I was under the impression that  the "instant html/
>> xhtml output option" meant "html which includes html in the input".
>

Sorry if I misled anyone with that statement. I'm with Yuri (and the
Markdown community at large). We don't fix bad input. In fact, the raw
html never gets put into the DOM anyway. It's stored as plain text and
added back in after the dom is converted into a string. Which means
that we can't really pass the user a DOM object for them to do as they
please because we aren't done with it yet. However, we could add a
keyword to `Markdown.convert()` that specifies the output format of
html or xhtml and pass that on to the DOM on serialization.

Btw, if anyone is interested in performance of html serializers and
parsers in python, here's a decent comparison:
http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

-- 
----
Waylan Limberg
wa...@gm...