From: Waylan L. <wa...@gm...> - 2008-07-04 03:00:14
|
On Thu, Jul 3, 2008 at 8:09 PM, Yuri Takhteyev <qar...@gm...> wrote: >> On 3-Jul-08, at 5:39 PM, Artem Yunusov wrote: >>> As far as I understand all the HTML from input replacing by >>> placeholders, and then inserting back only after serialization. So, it >>> won't be a problem in this case. > > Yes, Artem is right, we are now not attempting to parse HTML submitted > by the user, we just pass it through. This is what most (all?) > markdown implementations do. This also means that if the user > supplies bad HTML (or HTML that is not XHTML), then they will get back > what they gave us. Garbage in, garbage out. > > The consensus on the markdown list seems to have been that policing > HTML submitted by the user (which would include looking out for XSS > attacks) should be left to the client, who should filter the output of > markdown. > >> For some reason I was under the impression that the "instant html/ >> xhtml output option" meant "html which includes html in the input". > Sorry if I misled anyone with that statement. I'm with Yuri (and the Markdown community at large). We don't fix bad input. In fact, the raw html never gets put into the DOM anyway. It's stored as plain text and added back in after the dom is converted into a string. Which means that we can't really pass the user a DOM object for them to do as they please because we aren't done with it yet. However, we could add a keyword to `Markdown.convert()` that specifies the output format of html or xhtml and pass that on to the DOM on serialization. Btw, if anyone is interested in performance of html serializers and parsers in python, here's a decent comparison: http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/ -- ---- Waylan Limberg wa...@gm... |