From: Herbert P. <her...@gm...> - 2007-06-12 21:32:05
|
Hi, Is it somehow possible to not remove HTML but instead escape it with html entities ? this seems to be a much more user friendly way for wikis to deal with HTML. i tried to simply put in a pre processor, but had no luck yet.. basically because i'm not sure if i fully understand how the current implementation removes HTML . (like why HTML is escaped in code blocks and not fully removed) .. is there an easy way to do this ? thanks & cu, herbert P.S.: @Yuri Takhteyev: i guess you don't really care any more since you've already put up a wiki .. but anyway .. http://sct.sphene.net/ is my wiki based on python-markdown (and django) |
From: Yuri T. <qar...@gm...> - 2007-06-13 03:02:05
|
You should be able to do this with a preprocessor by simply pre-escaping all HTML, no? Alternatively, if you want a quick and dirty hack, look for the line that says: if self.safeMode and html != "<hr />" and html != "<br />": html = HTML_REMOVED_TEXT I do agree though that perhaps escaping html would be a better default. (Please do file a bug on sourceforge so that I don't forget to make this change later.) In the long term, perhaps, the new and more flexible way of managing pre-post-etc-processors would solve this problem as well. > implementation removes HTML . (like why HTML is escaped in code blocks > and not fully removed) .. An oversight on my part... > P.S.: @Yuri Takhteyev: i guess you don't really care any more since > you've already put up a wiki .. but anyway .. http://sct.sphene.net/ > is my wiki based on python-markdown (and django) I will stick with what I installed, but I do _care_ - it's good to have a Wiki based this module. Please add your project to the wiki under "Related Projects". - yuri -- http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-11-05 05:23:55
|
I've just committed a patch to svn (r53) that provides a nice middle ground to the escaping vs. removing html issue. The old behavior is still the default, but escaping is provided as an option. Currently, the global variable `HTML_REMOVED_TEXT` holds the text that is used for replacement. I set it up so that if that string is empty (or otherwise evaluates to `False` in python) then the html is escaped instead. In other words, you turn escaping on in the same way that you change the replacement text. Here's an example: >>> import markdown >>> markdown.HTML_REMOVED_TEXT = '' >>> md = markdown.Markdown(safe_mode=True) >>> md.convert('<a href="foo">foo</a> bar.') '<p><a href="foo">foo</a> bar.\n</p>' I left the default as the old behavior, but that could easily be switched. I also considered adding a new global (perhaps `ESCAPE_HTML`) which would simply hold a True/False value, but couldn't see adding an additional variable. If anyone feels otherwise, let me know. I see one potential problem with my solution which I hadn't considered until just now (after committing my patch). One could already have code that sets `HTML_REMOVED_TEXT` to an empty string so that all html is stripped and replaced with nothing. Some may prefer such a behavior. This makes that imposable to do. Is anyone doing this? Adding `ESCAPE_HTML` would address this issue, if it is one. Another solution would be to change the expected values of the `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or None rather than True/False. But that could get complicated/confusing. Oh, and obviously, the value of `HTML_REMOVED_TEXT` can be changed in the source file if one will always want that behavior. That can become a headache on upgrading to a new version though. Its usually better to future-proof your code IMO. I should also mention that I also moved the code that does the escaping/removing from the convert method to a text-post-processor. It makes more sense there regardless of this change IMO and simplifies the process of making your own extension to change the behavior. Extensions would be another way to address the issues I mention above. Perhaps we could just leave it at that. The escaping is very basic. Any improvements are welcome. Anyone know of a method already available in the python standard lib? Any objections, comments, suggestions are welcome. On 6/12/07, Yuri Takhteyev <qar...@gm...> wrote: > You should be able to do this with a preprocessor by simply > pre-escaping all HTML, no? Alternatively, if you want a quick and > dirty hack, look for the line that says: > > if self.safeMode and html != "<hr />" and html != "<br />": > html = HTML_REMOVED_TEXT > > I do agree though that perhaps escaping html would be a better > default. (Please do file a bug on sourceforge so that I don't forget > to make this change later.) In the long term, perhaps, the new and > more flexible way of managing pre-post-etc-processors would solve this > problem as well. > > > implementation removes HTML . (like why HTML is escaped in code blocks > > and not fully removed) .. > > An oversight on my part... > > > P.S.: @Yuri Takhteyev: i guess you don't really care any more since > > you've already put up a wiki .. but anyway .. http://sct.sphene.net/ > > is my wiki based on python-markdown (and django) > > I will stick with what I installed, but I do _care_ - it's good to > have a Wiki based this module. Please add your project to the wiki > under "Related Projects". > > - yuri > > -- > http://www.freewisdom.org/ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |
From: Trent M. <tr...@gm...> - 2007-11-06 05:37:11
|
> The escaping is very basic. Any improvements are welcome. Anyone know > of a method already available in the python standard lib? >>> import cgi >>> cgi.escape("<a href='blah'>foo & bar</a>") "<a href='blah'>foo & bar</a>" Trent -- Trent Mick tr...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-05 06:08:37
|
> until just now (after committing my patch). One could already have > code that sets `HTML_REMOVED_TEXT` to an empty string so that all html > is stripped and replaced with nothing. Some may prefer such a > behavior. This makes that imposable to do. Is anyone doing this? This does seem like a reasonable thing to allow. Why not use None instead of empty string as the code for escaping, testing for type(HTML_REMOVED_TEXT) == "string"? > Another solution would be to change the expected values of the > `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or > None rather than True/False. But that could get complicated/confusing. This is actually quote reasonable, except that we could make it more more backwards compatible by saying that safe_mode = None would turn it off, safe_mode = "escape" would escape the HTML, and "remove" or any other non-false value would replace HTML with the value of HTML_REMOVED_TEXT. I think for the documentation we should tell people to put "replace", but the actual code should treat any true value other than "escape" as meaning "removed". > I should also mention that I also moved the code that does the > escaping/removing from the convert method to a text-post-processor. It > makes more sense there regardless of this change IMO and simplifies > the process of making your own extension to change the behavior. > Extensions would be another way to address the issues I mention above. > Perhaps we could just leave it at that. I am glad you did it, but it would be nice to have a simpler solution, that does not depend on groking extensions. Thanks for all the work! When do you think we should make a release of 1.7? - yuri -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-11-05 14:15:28
|
On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > until just now (after committing my patch). One could already have > > code that sets `HTML_REMOVED_TEXT` to an empty string so that all html > > is stripped and replaced with nothing. Some may prefer such a > > behavior. This makes that imposable to do. Is anyone doing this? > > This does seem like a reasonable thing to allow. Why not use None > instead of empty string as the code for escaping, testing for > type(HTML_REMOVED_TEXT) == "string"? After sending this message last night, I realized this isn't that big of a problem. I'm currently testing by doing `if HTML_REMOVED_TEXT:` so `False`, 0, an empty string, and `None` will all result in escaping. What I missed last night is that a string containing one space will equate to True and trigger replacing rather than escaping. Seeing whitespace is a non-issue in html anyway, this seems like a reasonable solution. > > > Another solution would be to change the expected values of the > > `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or > > None rather than True/False. But that could get complicated/confusing. > > This is actually quote reasonable, except that we could make it more > more backwards compatible by saying that safe_mode = None would turn > it off, safe_mode = "escape" would escape the HTML, and "remove" or > any other non-false value would replace HTML with the value of > HTML_REMOVED_TEXT. I think for the documentation we should tell > people to put "replace", but the actual code should treat any true > value other than "escape" as meaning "removed". The more I think about it, the more I'm inclined to want a way to turn escaping on as a parameter, so I think I'll leave things the way they are, except that if safe_mode == "escape" we force escaping regardless of the value of HTML_REMOVED_TEXT. That seems to allow the most possabilites without extensions. > > > I should also mention that I also moved the code that does the > > escaping/removing from the convert method to a text-post-processor. It > > makes more sense there regardless of this change IMO and simplifies > > the process of making your own extension to change the behavior. > > Extensions would be another way to address the issues I mention above. > > Perhaps we could just leave it at that. > > I am glad you did it, but it would be nice to have a simpler solution, > that does not depend on groking extensions. > > Thanks for all the work! When do you think we should make a release of 1.7? I should update the escaping tonight from this discussion, and don't have anything else for the immediate future, so whenever your ready. I'll let you make those unicode changes that were discussed. You seem to understand that better than me anyway. Or was that just a documentation issue? > > - yuri > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-05 16:09:53
|
> I should update the escaping tonight from this discussion, and don't > have anything else for the immediate future, so whenever your ready. > I'll let you make those unicode changes that were discussed. You seem > to understand that better than me anyway. Or was that just a > documentation issue? Ok, I'll make them and update the documentation. - yuri -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-11-05 21:02:14
|
I've finished my updates. I've even updated the change_log for you. Feel free to release anytime. I should note that I decided to remove escape with HTML_REMOVED_TEXT as an empty string being that one would have to set safe_mode anyway. That seemed redundant once I started writing documentation. Btw, I did some work on the documentation [1]. If you like the format, I'll do the same for the other pages. For a full rundown of the new safe_mode functionality see that page. The italicized note can be removed upon release (or I can remove the section now and add it back upon release if preferred) [1]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > I should update the escaping tonight from this discussion, and don't > > have anything else for the immediate future, so whenever your ready. > > I'll let you make those unicode changes that were discussed. You seem > > to understand that better than me anyway. Or was that just a > > documentation issue? > > Ok, I'll make them and update the documentation. > > - yuri > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2007-11-05 21:38:04
|
Oh, I almost forgot to add escaping to the command line interface. It's there now, but I'm not sure I like it. I rarely, if ever, (except maybe when testing) us the command line interface, so if anyone else has any input, let me know. On 11/5/07, Waylan Limberg <wa...@gm...> wrote: > I've finished my updates. I've even updated the change_log for you. > Feel free to release anytime. > > I should note that I decided to remove escape with HTML_REMOVED_TEXT > as an empty string being that one would have to set safe_mode anyway. > That seemed redundant once I started writing documentation. > > Btw, I did some work on the documentation [1]. If you like the format, > I'll do the same for the other pages. > > For a full rundown of the new safe_mode functionality see that page. > The italicized note can be removed upon release (or I can remove the > section now and add it back upon release if preferred) > > [1]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module > > On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > > I should update the escaping tonight from this discussion, and don't > > > have anything else for the immediate future, so whenever your ready. > > > I'll let you make those unicode changes that were discussed. You seem > > > to understand that better than me anyway. Or was that just a > > > documentation issue? > > > > Ok, I'll make them and update the documentation. > > > > - yuri > > > > -- > > Yuri Takhteyev > > Ph.D. Candidate, UC Berkeley School of Information > > http://takhteyev.org/, http://www.freewisdom.org/ > > > > > -- > ---- > Waylan Limberg > wa...@gm... > -- ---- Waylan Limberg wa...@gm... |