From: nusenu <nus...@ri...> - 2017-05-24 17:40:17
|
> If you need to filter input from an untrusted source, then you > should not filter the Markdown input, but the HTML output instead. > For a detailed explanation of why, see this article: > https://michelf.ca/blog/2010/markdown-and-xss/ Thank you for the pointer, but I disagree with the main conclusion that there is "no other choice": > So the conclusion is that, if you want real security, you need to > filter Markdown’s output, not the input. **There’s no other choice.** but I would _like_ to be proven wrong so I can improve [1] (maybe with an example XSS payload that bypasses [1]). Why do I disagree? The blog post shows an example with a (poorly written) "XSS filter". The problem with "filter the HTML output not the Markdown input" is: I'm not in the position to choose. I have to provide Markdown output not HTML. Also: There must be a reason for Markdown to provide escape possibilities. [0] I claim that it is possible to write a filter that makes untrusted input, to be used in Markdown output, XSS-safe. The question is - Is there a known implementation? - If not: How invasive does such a filter has to be. In my current approach [1] I simply consider whitelisted characters only (the rest gets discarded) but I'm unhappy with that - because it is probably **not safe** and the displayed string is no longer the one provided by the untrusted source - so I'm looking for something better. "better" is: output string **looks** (after Markdown got converted to HTML) exactly like input string _and_ is XSS-safe > In Python I recommend Bleach with this whitelist as a good starting > place: https://github.com/yourcelf/bleach-whitelist > https://github.com/mozilla/bleach Yes, I saw your recommendation when reading your documentation [2]. Bleach is for HTML, I need something for Markdown. thanks for your help, nusenu [0] https://daringfireball.net/projects/markdown/syntax#backslash [2] https://pythonhosted.org/Markdown/reference.html#safe_mode [1] def strip_md(input): input=cgi.escape(input) input=re.sub(r'[\{\}\[\]()_]', ' ', input) # "." and "-" are Markdown metachars! whitelist=r'[0-9a-zA-Z"$%&/\',\.:;=?@\^\- ]' input="".join(re.findall(whitelist,input)) input=input.strip() input=re.sub('\s+',' ', input) return input |