Re: [Python-markdown-discuss] Markdown escape function?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> If you need to filter input from an untrusted source, then you
> should not filter the Markdown input, but the HTML output instead.
> For a detailed explanation of why, see this article: 
> https://michelf.ca/blog/2010/markdown-and-xss/

Thank you for the pointer, but I disagree with the main conclusion that
there is "no other choice":

> So the conclusion is that, if you want real security, you need to
> filter Markdown’s output, not the input. **There’s no other choice.**

but I would _like_ to be proven wrong so I can improve [1] (maybe with
an example XSS payload that bypasses [1]).

Why do I disagree?
The blog post shows an example with a (poorly written) "XSS filter".

The problem with "filter the HTML output not the Markdown input" is:
I'm not in the position to choose. I have to provide Markdown output not
HTML.
Also: There must be a reason for Markdown to provide escape
possibilities. [0]

I claim that it is possible to write a filter that makes untrusted
input, to be used in Markdown output, XSS-safe. The question is
- Is there a known implementation?
- If not: How invasive does such a filter has to be.

In my current approach [1] I simply consider whitelisted characters only
(the rest gets discarded) but I'm unhappy with that - because it is
probably **not safe** and the displayed string is no longer the one
provided by the untrusted source - so I'm looking for something better.

"better" is: output string **looks** (after Markdown got converted to
HTML) exactly like input string _and_ is XSS-safe

> In Python I recommend Bleach with this whitelist as a good starting 
> place: https://github.com/yourcelf/bleach-whitelist 
> https://github.com/mozilla/bleach

Yes, I saw your recommendation when reading your documentation [2].
Bleach is for HTML, I need something for Markdown.

thanks for your help,
nusenu

[0] https://daringfireball.net/projects/markdown/syntax#backslash
[2] https://pythonhosted.org/Markdown/reference.html#safe_mode
[1]
def strip_md(input):
    input=cgi.escape(input)
    input=re.sub(r'[\{\}\[\]()_]', ' ', input)
    # "." and "-" are Markdown metachars!
    whitelist=r'[0-9a-zA-Z"$%&/\',\.:;=?@\^\- ]'
    input="".join(re.findall(whitelist,input))
    input=input.strip()
    input=re.sub('\s+',' ', input)
    return input