From: nusenu <nus...@ri...> - 2017-05-25 16:49:37
|
Hi Waylan, thanks for your continued input. Some more context from my side might help here: I take the 'contact' (an arbitrary untrusted string) from a backend: https://onionoo.torproject.org/details?fields=contact and produce Markdown: https://raw.githubusercontent.com/nusenu/OrNetStats/master/maincwfamilies.md which Jekyll uses to produce HTML based on these files. Final output (the current output shows stripped output and does not exactly match the input, but the goal would be a displayed string that matches): https://nusenu.github.io/OrNetStats/maincwfamilies > Something to keep in mind is that there is no such thing as invalid > Markdown. I hope I didn't say something that would contradict that. > And then there is the many years that users have been using Markdown > (over a decade). There is a certain expectation regarding behavior > that exists today. Most everyone knows and expects that `**foo**` > will result in bold text. And it is not surprising if some service > disallows that, but then the expectation is that you will just get > `foo` back. Getting back `**foo**` would be surprising. [..] > > It is with these long-standing expectations of users in mind that us > long-time users of Markdown say that the only way to sanitize > Markdown is by sanitizing the HTML output. In my case the data source does not have any Markdown expectations (the source does not even know there is Markdown in the middle or what Markdown is). The source expects literal output obfuscated**dot**emailaddress**dot**tld should not become: myobfuscateddotemailaddressdottld In these examples I used the "*" but this is not limited to this character, this is about any metachars + the pipe sign (since I use the table extension). > As a practical matter, to sanitize the Markdown text before passing > it to the parser would require writing another parser, just one that > removes/escapes the disallowed markup. If that is what you really > want Yes :) > then just use Python-Markdown extension API to remove the > “strongPattern” from the parser. Is it possible to use your API to do Markdown escaping? (the initial question) input -> output examples: **foo** -> \*\*foo\*\* dot*foo -> dot*foo (no backslash) 1. -> 1\. example.com -> example.com (no backslash) ...(and all other meta chars) > If that is the behavior > you really want, that is the easiest way to get it. But I expect your > users will very much dislike it. As stated above - no worries here - since the data source does not know about Markdown - and therefore has no expectations. bellow you find my conversation with python-help - because it is also relevant and the reason I'm asking again here since I need a Markdown-aware escape function not a simple search/replace: ------------ Matt (python-help) wrote: > However, I also think that having an escape-the-markup > function in a markup library makes perfect sense. >> (simply replacing all metachars with \metachar does not work) > > But if you can be more specific about how that doesn't work, we may > be able to help suggest something. I wrote a very simple function that replaces all Markdown metachars to test that approach def escape_md(input): input=input.replace('\\','\\\\') input=input.replace('`', '\\`') input=input.replace('*', '\\*') ... return input The problem with this approach is that it does not care about the context. So this works fine against "**foo**" -> HTML: **foo** but it does not work with here: "d*ot" -> HTML: d\*ot similarly with other chars .([`- So in the end the escape function has to be Markdown aware and at that point I guess it is no longer a simply search and replace thing and should be part of a Markdown library (that is already aware of the syntax anyway). |