You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(14) |
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(7) |
Apr
(6) |
May
(25) |
Jun
(11) |
Jul
|
Aug
(5) |
Sep
(5) |
Oct
(39) |
Nov
(28) |
Dec
(6) |
2008 |
Jan
(4) |
Feb
(39) |
Mar
(14) |
Apr
(12) |
May
(14) |
Jun
(20) |
Jul
(60) |
Aug
(69) |
Sep
(20) |
Oct
(56) |
Nov
(41) |
Dec
(29) |
2009 |
Jan
(27) |
Feb
(21) |
Mar
(37) |
Apr
(18) |
May
(2) |
Jun
(6) |
Jul
(6) |
Aug
(5) |
Sep
(2) |
Oct
(12) |
Nov
(2) |
Dec
|
2010 |
Jan
(12) |
Feb
(13) |
Mar
(10) |
Apr
|
May
(6) |
Jun
(5) |
Jul
(10) |
Aug
(7) |
Sep
(8) |
Oct
(7) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
(6) |
Apr
(5) |
May
(6) |
Jun
(15) |
Jul
(2) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(5) |
2012 |
Jan
(6) |
Feb
|
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
(1) |
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(20) |
2013 |
Jan
|
Feb
|
Mar
(5) |
Apr
(1) |
May
(1) |
Jun
(9) |
Jul
(3) |
Aug
(5) |
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2014 |
Jan
(10) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(12) |
Sep
(9) |
Oct
(4) |
Nov
(8) |
Dec
(2) |
2015 |
Jan
(5) |
Feb
(5) |
Mar
(1) |
Apr
(1) |
May
(3) |
Jun
|
Jul
|
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(2) |
Feb
(2) |
Mar
(9) |
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
(1) |
2017 |
Jan
(9) |
Feb
|
Mar
(3) |
Apr
|
May
(14) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(9) |
2019 |
Jan
(4) |
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
From: Waylan L. <wa...@gm...> - 2007-11-07 21:42:58
|
I may not be exactly right here, but my understanding is that Mercurial, git and BZR are all 'distributed', while, of course, SVN is not. Mercurial and git, however, have a very different way of working (or at least different commands/workflow/api) than SVN/CSV. BZR, while more similar to Mercurial and git under the hood, offers commands and workflow that SVN users will feel more comfortable with. I see it as a nice middle ground. And if you want, you can use BZR in a remote server type situation as well. The best of both worlds. I should mention, this is all based upon very little time with Mercurial and git which I currently do not even have installed. However, I am happily using both bzr and svn and don't really care which way we go. On 11/7/07, Yuri Takhteyev <qar...@gm...> wrote: > Well, unfortunately Trac installation turned out to be a nightmare due > to zillion unsatisfiable dependencies. I had to give up on it. > Bazaar is still on the table, but I've exhausted my time quota for > playing with random new software and will have to take a break for few > weeks. > > I think I am convinced theoretically about advantages that Bazaar > might offer over SVN, but it seems like there are a few other tools > competing in that space, e.g. Mercurial, git, etc. I've been > wondering what the advantages and disadvantages are between those. > > - yuri > > On Nov 7, 2007 11:49 AM, Ben Wilson <da...@gm...> wrote: > > On 10/30/07, Yuri Takhteyev <qar...@gm...> wrote: > > > I am curious if anyone else has opinions on that? > > > > > > First, on Bazaar vs. SVN. I don't personally don't have much invested > > > in SVN and don't care that much. What I worry about is: will moving > > > to Bazaar raise the bar for other people who want to check out? In > > > theory, it seems, Bazaar is specifically designed to make it easier > > > for new people to join in. On the other hand, I am wondering if > > > people might be turned off by having to install a new VCS. > > > > > > Second, any comments on Roundup and Launchpad? Lauchpad seems to have > > > a nice community thing going, so if we want to switch to Bazaar this > > > might be a nice option. > > > > For my two cents, I have played both with Bazaar and SVN. I like both. > > If you are going to have multiple trusted developers, BZR is a great > > option because it offers distribution of the repository. SVN is good > > if you want to retain maximum control of the source code, IMO. BZR > > works best with ASCII/ISO text; SVN handles binary a bit better. SVN > > also has the advantage of being around for a while longer, but at the > > expense of having the RCS/CVS mentality about repository management. > > > > Regarding tools, Trac is a nice Python tool. It has an integrated wiki > > for all tickets, etc. The only weakness I perceive in Trac is the > > inability to customize the wiki syntax. Otherwise, it would be great > > to integrate Markdown with Trac and then be able to tout eating one's > > own dog food. I'm presently stalled on development (curse of the > > newborn and that whole family responsibility thing), but I was using > > Trac with BZR. > > > > Having offered some feedback, I'm generally a lurker and know the > > community will make the right choice. :-) > > > > -- > > Ben Wilson > > "Words are the only thing which will last forever" Churchill > > > > > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-07 18:23:06
|
Well, unfortunately Trac installation turned out to be a nightmare due to zillion unsatisfiable dependencies. I had to give up on it. Bazaar is still on the table, but I've exhausted my time quota for playing with random new software and will have to take a break for few weeks. I think I am convinced theoretically about advantages that Bazaar might offer over SVN, but it seems like there are a few other tools competing in that space, e.g. Mercurial, git, etc. I've been wondering what the advantages and disadvantages are between those. - yuri On Nov 7, 2007 11:49 AM, Ben Wilson <da...@gm...> wrote: > On 10/30/07, Yuri Takhteyev <qar...@gm...> wrote: > > I am curious if anyone else has opinions on that? > > > > First, on Bazaar vs. SVN. I don't personally don't have much invested > > in SVN and don't care that much. What I worry about is: will moving > > to Bazaar raise the bar for other people who want to check out? In > > theory, it seems, Bazaar is specifically designed to make it easier > > for new people to join in. On the other hand, I am wondering if > > people might be turned off by having to install a new VCS. > > > > Second, any comments on Roundup and Launchpad? Lauchpad seems to have > > a nice community thing going, so if we want to switch to Bazaar this > > might be a nice option. > > For my two cents, I have played both with Bazaar and SVN. I like both. > If you are going to have multiple trusted developers, BZR is a great > option because it offers distribution of the repository. SVN is good > if you want to retain maximum control of the source code, IMO. BZR > works best with ASCII/ISO text; SVN handles binary a bit better. SVN > also has the advantage of being around for a while longer, but at the > expense of having the RCS/CVS mentality about repository management. > > Regarding tools, Trac is a nice Python tool. It has an integrated wiki > for all tickets, etc. The only weakness I perceive in Trac is the > inability to customize the wiki syntax. Otherwise, it would be great > to integrate Markdown with Trac and then be able to tout eating one's > own dog food. I'm presently stalled on development (curse of the > newborn and that whole family responsibility thing), but I was using > Trac with BZR. > > Having offered some feedback, I'm generally a lurker and know the > community will make the right choice. :-) > > -- > Ben Wilson > "Words are the only thing which will last forever" Churchill > -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Ben W. <da...@gm...> - 2007-11-07 17:49:04
|
On 10/30/07, Yuri Takhteyev <qar...@gm...> wrote: > I am curious if anyone else has opinions on that? > > First, on Bazaar vs. SVN. I don't personally don't have much invested > in SVN and don't care that much. What I worry about is: will moving > to Bazaar raise the bar for other people who want to check out? In > theory, it seems, Bazaar is specifically designed to make it easier > for new people to join in. On the other hand, I am wondering if > people might be turned off by having to install a new VCS. > > Second, any comments on Roundup and Launchpad? Lauchpad seems to have > a nice community thing going, so if we want to switch to Bazaar this > might be a nice option. For my two cents, I have played both with Bazaar and SVN. I like both. If you are going to have multiple trusted developers, BZR is a great option because it offers distribution of the repository. SVN is good if you want to retain maximum control of the source code, IMO. BZR works best with ASCII/ISO text; SVN handles binary a bit better. SVN also has the advantage of being around for a while longer, but at the expense of having the RCS/CVS mentality about repository management. Regarding tools, Trac is a nice Python tool. It has an integrated wiki for all tickets, etc. The only weakness I perceive in Trac is the inability to customize the wiki syntax. Otherwise, it would be great to integrate Markdown with Trac and then be able to tout eating one's own dog food. I'm presently stalled on development (curse of the newborn and that whole family responsibility thing), but I was using Trac with BZR. Having offered some feedback, I'm generally a lurker and know the community will make the right choice. :-) -- Ben Wilson "Words are the only thing which will last forever" Churchill |
From: Trent M. <tr...@gm...> - 2007-11-06 05:57:46
|
> If Trent's test script does > all that ours does and more, we could just use it instead. Please feel free. The main generic harness support is test/testlib.py (provided for TestSkipped -- sometimes useful, listing available test cases, and tagging and command-line filtering to run specific tests). The markdown2-specific harness (test_markdown2.py) generates individual unittest.TestCase methods for each text file in given test case dirs. It should be very straighforward for you to re-use it. Trent -- Trent Mick tr...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-06 05:51:56
|
Yes, I think you are right. I used "ext-x-tables" approach in my test script because I was looking for a simple solution. I only wanted to test one extension at a time, and for that simple case this worked well, since you could see from the directory name what is being test. If we want to preserve this feature, perhaps the thing to do would be to check for some patter in the directory name, and look for a .opts file in that case. I.e.: all tests in "basic" would be run with default options, all tests in "ext-x-tables" would be run with the tables extension, and all tests in "basic-with-options" would look for .opts file for each test. Or we could just always look for .opts. In either case, have anyone looked at both test scripts enough to report on what the pros and cons of each are? If Trent's test script does all that ours does and more, we could just use it instead. - yuri On Nov 5, 2007 11:32 PM, Trent Mick <tr...@gm...> wrote: > On 11/5/07, Waylan Limberg <wa...@gm...> wrote: > > After working on safe_mode to add escaping, I realized that there's no > > way to test safe_mode in the testing framework. A few possibilities > > occurred to me. > > The way I do it in markdown2.py's test suite is to have a "foo.opts" > file beside the "foo.txt" and "foo.html". The test harness looks for a > corresponding ".opts" file, reads it, evals it as Python (it must be a > Python dict) and passes it to the convert function via kwargs. > > For example: > > http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.text > http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.html > http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.opts > > > Thing is, the same text files could easily work with each of the > > possible modes. So another option would be to include additional html > > files in the existing directories with a special name (ie, > > "escaped-somefile.html). If those files exist, then additional tests > > would be run in the appropriate mode. > > I've tried that before -- in another project -- and found it too > limiting. For example, this kind of this couldn't be done: > > http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/link_patterns.opts > > Cheers, > Trent > > -- > Trent Mick > tr...@gm... > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Trent M. <tr...@gm...> - 2007-11-06 05:37:11
|
> The escaping is very basic. Any improvements are welcome. Anyone know > of a method already available in the python standard lib? >>> import cgi >>> cgi.escape("<a href='blah'>foo & bar</a>") "<a href='blah'>foo & bar</a>" Trent -- Trent Mick tr...@gm... |
From: Trent M. <tr...@gm...> - 2007-11-06 05:32:48
|
On 11/5/07, Waylan Limberg <wa...@gm...> wrote: > After working on safe_mode to add escaping, I realized that there's no > way to test safe_mode in the testing framework. A few possibilities > occurred to me. The way I do it in markdown2.py's test suite is to have a "foo.opts" file beside the "foo.txt" and "foo.html". The test harness looks for a corresponding ".opts" file, reads it, evals it as Python (it must be a Python dict) and passes it to the convert function via kwargs. For example: http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.text http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.html http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/safe_mode.opts > Thing is, the same text files could easily work with each of the > possible modes. So another option would be to include additional html > files in the existing directories with a special name (ie, > "escaped-somefile.html). If those files exist, then additional tests > would be run in the appropriate mode. I've tried that before -- in another project -- and found it too limiting. For example, this kind of this couldn't be done: http://python-markdown2.googlecode.com/svn/trunk/test/tm-cases/link_patterns.opts Cheers, Trent -- Trent Mick tr...@gm... |
From: Waylan L. <wa...@gm...> - 2007-11-06 01:30:58
|
After working on safe_mode to add escaping, I realized that there's no way to test safe_mode in the testing framework. A few possibilities occurred to me. We could use the same approach as extensions (well the way they're supposed to work - currently they're ignored afaict) and have a separate directory for each state. Thing is, the same text files could easily work with each of the possible modes. So another option would be to include additional html files in the existing directories with a special name (ie, "escaped-somefile.html). If those files exist, then additional tests would be run in the appropriate mode. It also occurs to me that some unit tests may be helpful for testing the api. Maybe some simple tests of safe_mode there would be sufficient. The question is, should any unit tests be part of test-markdown.py or separate? Anyone have any thoughts? I'm thinking this is what I'm tackling after 1.7 is released. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2007-11-05 21:38:04
|
Oh, I almost forgot to add escaping to the command line interface. It's there now, but I'm not sure I like it. I rarely, if ever, (except maybe when testing) us the command line interface, so if anyone else has any input, let me know. On 11/5/07, Waylan Limberg <wa...@gm...> wrote: > I've finished my updates. I've even updated the change_log for you. > Feel free to release anytime. > > I should note that I decided to remove escape with HTML_REMOVED_TEXT > as an empty string being that one would have to set safe_mode anyway. > That seemed redundant once I started writing documentation. > > Btw, I did some work on the documentation [1]. If you like the format, > I'll do the same for the other pages. > > For a full rundown of the new safe_mode functionality see that page. > The italicized note can be removed upon release (or I can remove the > section now and add it back upon release if preferred) > > [1]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module > > On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > > I should update the escaping tonight from this discussion, and don't > > > have anything else for the immediate future, so whenever your ready. > > > I'll let you make those unicode changes that were discussed. You seem > > > to understand that better than me anyway. Or was that just a > > > documentation issue? > > > > Ok, I'll make them and update the documentation. > > > > - yuri > > > > -- > > Yuri Takhteyev > > Ph.D. Candidate, UC Berkeley School of Information > > http://takhteyev.org/, http://www.freewisdom.org/ > > > > > -- > ---- > Waylan Limberg > wa...@gm... > -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2007-11-05 21:02:14
|
I've finished my updates. I've even updated the change_log for you. Feel free to release anytime. I should note that I decided to remove escape with HTML_REMOVED_TEXT as an empty string being that one would have to set safe_mode anyway. That seemed redundant once I started writing documentation. Btw, I did some work on the documentation [1]. If you like the format, I'll do the same for the other pages. For a full rundown of the new safe_mode functionality see that page. The italicized note can be removed upon release (or I can remove the section now and add it back upon release if preferred) [1]: http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > I should update the escaping tonight from this discussion, and don't > > have anything else for the immediate future, so whenever your ready. > > I'll let you make those unicode changes that were discussed. You seem > > to understand that better than me anyway. Or was that just a > > documentation issue? > > Ok, I'll make them and update the documentation. > > - yuri > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-05 16:09:53
|
> I should update the escaping tonight from this discussion, and don't > have anything else for the immediate future, so whenever your ready. > I'll let you make those unicode changes that were discussed. You seem > to understand that better than me anyway. Or was that just a > documentation issue? Ok, I'll make them and update the documentation. - yuri -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-11-05 14:15:28
|
On 11/5/07, Yuri Takhteyev <qar...@gm...> wrote: > > until just now (after committing my patch). One could already have > > code that sets `HTML_REMOVED_TEXT` to an empty string so that all html > > is stripped and replaced with nothing. Some may prefer such a > > behavior. This makes that imposable to do. Is anyone doing this? > > This does seem like a reasonable thing to allow. Why not use None > instead of empty string as the code for escaping, testing for > type(HTML_REMOVED_TEXT) == "string"? After sending this message last night, I realized this isn't that big of a problem. I'm currently testing by doing `if HTML_REMOVED_TEXT:` so `False`, 0, an empty string, and `None` will all result in escaping. What I missed last night is that a string containing one space will equate to True and trigger replacing rather than escaping. Seeing whitespace is a non-issue in html anyway, this seems like a reasonable solution. > > > Another solution would be to change the expected values of the > > `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or > > None rather than True/False. But that could get complicated/confusing. > > This is actually quote reasonable, except that we could make it more > more backwards compatible by saying that safe_mode = None would turn > it off, safe_mode = "escape" would escape the HTML, and "remove" or > any other non-false value would replace HTML with the value of > HTML_REMOVED_TEXT. I think for the documentation we should tell > people to put "replace", but the actual code should treat any true > value other than "escape" as meaning "removed". The more I think about it, the more I'm inclined to want a way to turn escaping on as a parameter, so I think I'll leave things the way they are, except that if safe_mode == "escape" we force escaping regardless of the value of HTML_REMOVED_TEXT. That seems to allow the most possabilites without extensions. > > > I should also mention that I also moved the code that does the > > escaping/removing from the convert method to a text-post-processor. It > > makes more sense there regardless of this change IMO and simplifies > > the process of making your own extension to change the behavior. > > Extensions would be another way to address the issues I mention above. > > Perhaps we could just leave it at that. > > I am glad you did it, but it would be nice to have a simpler solution, > that does not depend on groking extensions. > > Thanks for all the work! When do you think we should make a release of 1.7? I should update the escaping tonight from this discussion, and don't have anything else for the immediate future, so whenever your ready. I'll let you make those unicode changes that were discussed. You seem to understand that better than me anyway. Or was that just a documentation issue? > > - yuri > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-11-05 06:08:37
|
> until just now (after committing my patch). One could already have > code that sets `HTML_REMOVED_TEXT` to an empty string so that all html > is stripped and replaced with nothing. Some may prefer such a > behavior. This makes that imposable to do. Is anyone doing this? This does seem like a reasonable thing to allow. Why not use None instead of empty string as the code for escaping, testing for type(HTML_REMOVED_TEXT) == "string"? > Another solution would be to change the expected values of the > `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or > None rather than True/False. But that could get complicated/confusing. This is actually quote reasonable, except that we could make it more more backwards compatible by saying that safe_mode = None would turn it off, safe_mode = "escape" would escape the HTML, and "remove" or any other non-false value would replace HTML with the value of HTML_REMOVED_TEXT. I think for the documentation we should tell people to put "replace", but the actual code should treat any true value other than "escape" as meaning "removed". > I should also mention that I also moved the code that does the > escaping/removing from the convert method to a text-post-processor. It > makes more sense there regardless of this change IMO and simplifies > the process of making your own extension to change the behavior. > Extensions would be another way to address the issues I mention above. > Perhaps we could just leave it at that. I am glad you did it, but it would be nice to have a simpler solution, that does not depend on groking extensions. Thanks for all the work! When do you think we should make a release of 1.7? - yuri -- Yuri Takhteyev Ph.D. Candidate, UC Berkeley School of Information http://takhteyev.org/, http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-11-05 05:23:55
|
I've just committed a patch to svn (r53) that provides a nice middle ground to the escaping vs. removing html issue. The old behavior is still the default, but escaping is provided as an option. Currently, the global variable `HTML_REMOVED_TEXT` holds the text that is used for replacement. I set it up so that if that string is empty (or otherwise evaluates to `False` in python) then the html is escaped instead. In other words, you turn escaping on in the same way that you change the replacement text. Here's an example: >>> import markdown >>> markdown.HTML_REMOVED_TEXT = '' >>> md = markdown.Markdown(safe_mode=True) >>> md.convert('<a href="foo">foo</a> bar.') '<p><a href="foo">foo</a> bar.\n</p>' I left the default as the old behavior, but that could easily be switched. I also considered adding a new global (perhaps `ESCAPE_HTML`) which would simply hold a True/False value, but couldn't see adding an additional variable. If anyone feels otherwise, let me know. I see one potential problem with my solution which I hadn't considered until just now (after committing my patch). One could already have code that sets `HTML_REMOVED_TEXT` to an empty string so that all html is stripped and replaced with nothing. Some may prefer such a behavior. This makes that imposable to do. Is anyone doing this? Adding `ESCAPE_HTML` would address this issue, if it is one. Another solution would be to change the expected values of the `safe_mode` parameter for Markdown() to one of 'strip', 'escape', or None rather than True/False. But that could get complicated/confusing. Oh, and obviously, the value of `HTML_REMOVED_TEXT` can be changed in the source file if one will always want that behavior. That can become a headache on upgrading to a new version though. Its usually better to future-proof your code IMO. I should also mention that I also moved the code that does the escaping/removing from the convert method to a text-post-processor. It makes more sense there regardless of this change IMO and simplifies the process of making your own extension to change the behavior. Extensions would be another way to address the issues I mention above. Perhaps we could just leave it at that. The escaping is very basic. Any improvements are welcome. Anyone know of a method already available in the python standard lib? Any objections, comments, suggestions are welcome. On 6/12/07, Yuri Takhteyev <qar...@gm...> wrote: > You should be able to do this with a preprocessor by simply > pre-escaping all HTML, no? Alternatively, if you want a quick and > dirty hack, look for the line that says: > > if self.safeMode and html != "<hr />" and html != "<br />": > html = HTML_REMOVED_TEXT > > I do agree though that perhaps escaping html would be a better > default. (Please do file a bug on sourceforge so that I don't forget > to make this change later.) In the long term, perhaps, the new and > more flexible way of managing pre-post-etc-processors would solve this > problem as well. > > > implementation removes HTML . (like why HTML is escaped in code blocks > > and not fully removed) .. > > An oversight on my part... > > > P.S.: @Yuri Takhteyev: i guess you don't really care any more since > > you've already put up a wiki .. but anyway .. http://sct.sphene.net/ > > is my wiki based on python-markdown (and django) > > I will stick with what I installed, but I do _care_ - it's good to > have a Wiki based this module. Please add your project to the wiki > under "Related Projects". > > - yuri > > -- > http://www.freewisdom.org/ > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2007-11-03 21:18:43
|
Wanted to copy everyone on a discussion between Yuri and myself regarding the purpose and behavior of the reset method. The bug that sparked this discussion has since been fixed in svn, but I thought Yuri's explanation (included below) of reset was very helpful. It occurs to me that this may be good to add to the documentation in some form on the "Writing Extensions" page. Anyone else have any thoughts, additions, etc.? On 11/3/07, Yuri Takhteyev <qar...@gm...> wrote: > The intended behavior was the following: calling reset() may be > necessary depending on the extensions that you use, but should not be > required in general. So, in this case it is a bug. > > Ideally, the user will reset on individual extensions, when needed, > which may not be after every conversion. E.g., suppose you want to > convert several "books", each with several chapters, where each > chapter is a separate markdown file. Suppose you want all footnotes > for all chapters of each book to be rendered as endnotes, as a > separate file in that book. So, you write an endnote extension and > use it as follows: > > for book in books: > md.extensions["endnotes"].reset() # can't remember the exact syntax > for chapter in chapter.get_books(): > converted_chapter = md.convert(chapter) > converted_chapter.save() > book.endnotes = md.extensions["endnotes"].get_endnotes() > > So, in this case you reset the extension, process _several_ files, > then reset it. > > reset() is just a shortcut to reset all extensions - if you want to > make sure that you start with a clean slate. > > All that said, if you don't have any extensions, convert() shouldn't > change the state of anything, so what Trent reported _is_ a bug. > > Oh, and yes, we should discuss this on the list. You can either post > your question there and I will re-post the answer, or you can forward > my response, or whatever. > > - yuri > > On Nov 3, 2007 1:08 PM, Waylan Limberg <wa...@gm...> wrote: > > Yuri, > > > > Could you clarify the intended behavior here. My assumption is that > > md.reset should be called between each call to md.convert and that > > this is not a bug, just a documentation issue. If you can confirm, > > I'll update the docs. > > > > Although, if we are depreciating/removing all other methods and > > md.convert will be the only way to pass source text in, perhaps > > convert should automatically call reset each time. Perhaps that's a > > question to pose to the list. > > > > ---------- Forwarded message ---------- > > From: SourceForge.net <no...@so...> > > Date: Nov 3, 2007 1:04 PM > > Subject: [ python-markdown-Bugs-1825231 ] repeated md.convert() can > > give incorrect results > > To: no...@so... > > > > > > Bugs item #1825231, was opened at 2007-11-03 17:04 > > Message generated for change (Tracker Item Submitted) made by Item Submitter > > You can respond by visiting: > > https://sourceforge.net/tracker/?func=detail&atid=790198&aid=1825231&group_id=153041 > > > > Please note that this message will contain a full copy of the comment thread, > > including the initial issue submission, for this request, > > not just the latest update. > > Category: None > > Group: Markdown Core > > Status: Open > > Resolution: None > > Priority: 5 > > Private: No > > Submitted By: Trent Mick (tmick) > > Assigned to: Nobody/Anonymous (nobody) > > Summary: repeated md.convert() can give incorrect results > > > > Initial Comment: > > This is for markdown.py version: > > version = "1.6b" > > version_info = (1,6,2,"rc-2") > > > > > > A problem can occur when you have code that creates a > > Markdown instance and calls .convert() on it more than once, e.g.: > > > > m = Markdown() > > m.convert(some_text) > > m.convert(some_other_text) > > > > Currently, if "some_other_text" is the empty string, then > > m.convert(some_other_text) will return the results for > > m.convert(some_text). > > > > Or is it the intention that one must call: > > > > m.reset() > > > > before calling m.convert() again? > > > > If so, perhaps the short example in the module docstring could be > > updated to indicate that. > > > > (Aside: the module docstring usage snippet is wrong. It creates a > > 'Markdown' instance as variable 'md' but then doesn't use 'md' for the > > '.convert()' call.) > > > > > > > > > > ---------------------------------------------------------------------- > > > > You can respond by visiting: > > https://sourceforge.net/tracker/?func=detail&atid=790198&aid=1825231&group_id=153041 > > > > > > -- > > ---- > > Waylan Limberg > > wa...@gm... > > > > > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-10-31 08:45:19
|
I mentioned in an another email today that we plan to move the site to markdown.freewisdom.org, which is already up and is running an infogami wiki. However, I ran into a few issues customizing that wiki, so the migration might have to wait until those issues get resolved. Meanwhile, I upgraded my own wiki at the current project website (http://www.freewisdom.org/projects/python-markdown/) , so all of you should be able to edit it. So, go ahead and enjoy. We'll then move the content over later, when I am done with customization. There is a remaining bug with login, which however has a simple work-around. The cookies are not working right, so logging in only works for one page view. So, put your user name and password at the bottom of the edit page, hit save, and the wiki will log you in, save your changes, and then immediately forget who you are. But it _will_ save your changes. And if your let your browser remember the password you won't even notice that you've been forgotten. To create an account just log in with a new user name. If you find any problems with the wiki, email me directly. - yuri -- Yuri Takhteyev http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-10-31 01:36:18
|
On 10/30/07, Kent Johnson <ke...@td...> wrote: > Yuri Takhteyev wrote: > > > I want to stick with a simple rule: if it's a string, then > > its unicode. > > > > So, I think we should offer the following functions: > > > > 1. unicode text -> unicode html > > Hmm...one problem with this (and Waylan's suggestion of making the > encoding parameter to markdown() do something useful) is that until > 1.6b, markdown() did in fact work perfectly well with encoded text and > it was not at all clear that this was not the intended usage. When 1.6b > came out I just commented out the call to removeBOM(), complained to the > list, and continued on my way. > > I use markdown from Django with the markdown support included with > Django; presumably many other people are also. For example: > http://www.freewisdom.org/projects/python-markdown/Django > > which is based on this post by Waylan: > http://achinghead.com/archive/70/django-blog-and-markdown/ > > which is pretty close to the current form of the Django markdown filter. > You almost have a point. In fact, I was about to make the same argument. Then I remembered that that was before the unicode branch was merged in Django. Ticket 2910 [1] needs to be updated for this and hasn't. Well, the latest patch does try to address it, but I was never convinced it was right. Yuri's clarifications make it clear what needs to happen in that patch. We make sure we have unicode to pass in (Django has the mechanisms to force the issue) and so we should always get unicode out. [1]: http://code.djangoproject.com/attachment/ticket/2910/2910-2.diff BTW, I consider ticket 2910 the most up-to-date approach to Django integration. The Markdown docs should probably be updated I suppose those still using pre-unicode versions of Django could have issues. But if your not updating Django, then I wouldn't expect you to update its dependencies either.. -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2007-10-30 22:34:57
|
> Hmm...one problem with this (and Waylan's suggestion of making the > encoding parameter to markdown() do something useful) is that until > 1.6b, markdown() did in fact work perfectly well with encoded text and > it was not at all clear that this was not the intended usage. When 1.6b > came out I just commented out the call to removeBOM(), complained to the > list, and continued on my way. Good point... But I think that was a mistake, which needs to be corrected. In the very least, I don't want any new users to use it that way. So, the question is: what would be a good balance between fixing this problem and not screwing existing users? I suggest releasing 1.7 with all of Waylan's recent fixes and this change and putting a clear message in release notes that in 1.7 markdown.markdown() expects unicode and that if you've got utf8-encoded strings, then you should call it with markdown.markdown(input.encode("utf8")) > I use markdown from Django with the markdown support included with > Django; presumably many other people are also. For example: > http://www.freewisdom.org/projects/python-markdown/Django I haven't touched Django for some time, so I am not sure what it does with unicode today. I remember that in December 2006 it was a mess. At that time they did pass encoded bytestrings around. At this point they seem to give you an option of either using bytestrings or unicode (http://www.djangoproject.com/documentation/unicode/) I think the thing to do here is to write a new version of the plugin, which would check if the input is unicode, and if not would decode it from utf8 before sending it to markdown. People who update the 1.7 will also need to update to the new plugin, which doesn't seem to be so bad. We should probably also send the new plugin to the Django team and ask them to include it instead of the old one. (BTW, does Django actually include markdown or just the plugin?) Perhaps the django plugin should be included with markdown release? We should probably also put it in SVN. - yuri -- Yuri Takhteyev http://www.freewisdom.org/ |
From: Kent J. <ke...@td...> - 2007-10-30 22:05:12
|
Yuri Takhteyev wrote: > I want to stick with a simple rule: if it's a string, then > its unicode. > > So, I think we should offer the following functions: > > 1. unicode text -> unicode html Hmm...one problem with this (and Waylan's suggestion of making the encoding parameter to markdown() do something useful) is that until 1.6b, markdown() did in fact work perfectly well with encoded text and it was not at all clear that this was not the intended usage. When 1.6b came out I just commented out the call to removeBOM(), complained to the list, and continued on my way. I use markdown from Django with the markdown support included with Django; presumably many other people are also. For example: http://www.freewisdom.org/projects/python-markdown/Django which is based on this post by Waylan: http://achinghead.com/archive/70/django-blog-and-markdown/ which is pretty close to the current form of the Django markdown filter. Kent |
From: Yuri T. <qar...@gm...> - 2007-10-30 20:45:23
|
> > However, I see markdown.markdown() as a shortcut for the common case. > > So maybe we could add some basic encoding/decoding for common cases. > > Seems reasonable. Well, except that what I learned the hard way while adding unicode support to MD, is that there seems to be only one "right" way to work with unicode in Python: decode when you read the file and encode when you write. Once you got encoded strings flying around, it's a recipe for problems. So, I don't want to endorse passing encoded strings as "the common case." In most cases, reading the content of a file without decoding is a bad idea and I don't want to encourage people to do that. Instead, I want to stick with a simple rule: if it's a string, then its unicode. So, I think we should offer the following functions: 1. unicode text -> unicode html 2. file path for input, encoding -> unicode html 3. file path for input, encoding, file path for output -> (writes to file) I see markdown.markdown() as doing #1. markdown.markdownFromFile() now does #3. We _could_ change it to also do #2. We could make it always return the unicode string, and also write encoded output to "output" if that argument is set. (We should probably accept either a file name or a stream as that parameter.) Now, if people feel that there is a common (if ungodly) case when the user need to deal with incoming encoded strings, I suggest we add a new method for that: markdownFromEncodedString() which will do decoding and return unicode. Though, in that case it should really be enough to write markdown.markdown(unicode(my_ungodly_string, "utf8")) So, I am not sure if such a method is really needed. > I think *two* encodings is overkill for both markdown() and > markdownFromFile(). In the common case they will likely be the same and > it is so easy to do the conversion yourself if you want them to be > different. Again, markdown() will no longer have encoding. As to the second, I tend to agree, especially if markdownFromFile could return the unicode instead of writing it to a file. > I hope by 'fails gracefully' you mean 'raises UnicodeDecodeError'. What > else could you do? Start guessing encodings? I think we should raise an error. The only question is: should we return a better error message. > if encoding is not None: > text = text.decode(encoding) > converted = md.convert(text) > if encoding is not None: > converted = converted.encode(encoding) > return converted Again, I would really rather stick with a simple rule of "files are encoded, strings are unicode" and banish encoded strings completely. Otherwise keeping track of what is and what is not unicode becomes a huge headache. It also becomes hard to explain to other people what exactly we are doing. The only place where .encode() appears now is in sys.stdout.write(new_text.encode(encoding)) Note that in this case I do the conversion without saving the encoded string on purpose. If sys.stdout.write wants an encoded string, that's fine - I'll give it to it, but I don't want to have any encoded strings sticking around. If I had to keep them for any reasons, I would make sure to prefix them with "encoded_" - yuri -- Yuri Takhteyev http://www.freewisdom.org/ |
From: Kent J. <ke...@td...> - 2007-10-30 20:11:43
|
Waylan Limberg wrote: > However, I see markdown.markdown() as a shortcut for the common case. > So maybe we could add some basic encoding/decoding for common cases. Seems reasonable. > The user must pass in the encoding (and perhaps an optional output > encoding) I think *two* encodings is overkill for both markdown() and markdownFromFile(). In the common case they will likely be the same and it is so easy to do the conversion yourself if you want them to be different. > and, assuming the encoding actually matches (the users > responsability - otherwise it fails gracefully) I hope by 'fails gracefully' you mean 'raises UnicodeDecodeError'. What else could you do? Start guessing encodings? > things work fine. If > the user has a situation that doesn't fit the common case, then we > would expect that the encoding/decoding will be done manually with > md.convert. As long as the differances are clearly documented, that > should work fine. Of course, the more I think about this, the more is > feels like extra work I don't want to do. Any input? It's easy. In markdown() change return md.convert(text) to if encoding is not None: text = text.decode(encoding) converted = md.convert(text) if encoding is not None: converted = converted.encode(encoding) return converted Kent |
From: Yuri T. <qar...@gm...> - 2007-10-30 20:09:43
|
I am curious if anyone else has opinions on that? First, on Bazaar vs. SVN. I don't personally don't have much invested in SVN and don't care that much. What I worry about is: will moving to Bazaar raise the bar for other people who want to check out? In theory, it seems, Bazaar is specifically designed to make it easier for new people to join in. On the other hand, I am wondering if people might be turned off by having to install a new VCS. Second, any comments on Roundup and Launchpad? Lauchpad seems to have a nice community thing going, so if we want to switch to Bazaar this might be a nice option. - yuri On 10/30/07, Waylan Limberg <wa...@gm...> wrote: > > I haven't looked at everything out there (the list is long), but I > have looked at a few options that seem like they should work. > > I see 3 basic categories: > 1. host your own, > 2. simple - hosted elsewhere, and > 3. full-featured - hosted elsewhere. > > Here's my pick in each category: > > 1. If you want to host our own solution and have total control, > Roundup [1] seems like a good option. It's actually what Python uses > [2] for a tracker, so there should be plenty of easy to get support > for years to come. The package comes with a demo app that runs as > localhost if you want to play around ( I did breifly). Of course, > going this route will require more work to get things configured > up-front, but once thats done, it should be easy going. If we go this > route, you'de have to set things up on your server. The docs do > provide instructions [8] for various hosting environments. > > 2. For a simple hosted solution, I would recommend Goode Code [3]. > I've used it to report bugs and it was painless. I just set up a > project of my own [4] with it and am impressed with the simplicity of > it all. I'll take this anyday over SourceForge. This is probably the > easiest point of entry. I'll even volunteer to set it up if you want > to go this way. I've transfered svn repos from one server to another > preserving history before, so I'd be willing to give that a shot as > well. > > 3. For a more direct replacement of SourceForge with a large > community, lots of projects, etc, Launchpad [5] would fill that bill > nicely. They have a nice tour [6] that summarizes their features. We > would have to switch fron svn to bzr [7] though. Thats not a problem > for me (I'd prefer it), but my guess is less users already have bzr > installed. On a side note, this site was snappy earlier today, but > while retrieving the links just now, it was **realy** slow. > > [1]: http://roundup.sourceforge.net/ > [2]: http://bugs.python.org/ > [3]: http://code.google.com/hosting/ > [4]: http://code.google.com/p/wlpages/ > [5]: https://launchpad.net/ > [6]: https://launchpad.net/+tour > [7]: http://bazaar-vcs.org/ > [8]: http://roundup.sourceforge.net/doc-1.0/installation.html#installation > -- > ---- > Waylan Limberg > wa...@gm... > -- Yuri Takhteyev http://www.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2007-10-30 20:05:19
|
On 10/30/07, Waylan Limberg <wa...@gm...> wrote: > On 10/30/07, Yuri Takhteyev <qar...@gm...> wrote: > > I haven't had a chance to look at the specific problem, but in > > general, here how it is _supposed_ to work. > > Ahh, well that clears a few things up for me. Thanks for the explaination= . > > Now for my proposal: > > Lets leave md.convert the way it is. The user has to convert to > unicode first and then get unicode back in his code which he can do > with as he pleases. > > However, I see markdown.markdown() as a shortcut for the common case. > So maybe we could add some basic encoding/decoding for common cases. > The user must pass in the encoding (and perhaps an optional output > encoding) and, assuming the encoding actually matches (the users > responsability - otherwise it fails gracefully) things work fine. If > the user has a situation that doesn't fit the common case, then we > would expect that the encoding/decoding will be done manually with > md.convert. As long as the differances are clearly documented, that > should work fine. Of course, the more I think about this, the more is > feels like extra work I don't want to do. Any input? I should mention that I see all this happening in the markdown() function itself, not as part of Markdown. Markdown.__init__ or Markdown.convert will always get unicode. > > Obviously, markdownFromFile is a differant animal and should work as > you proposed. > > > > > > The Markdown class is unicode-in-unicode-out. It can take a simple > > string as input, but one should never pass an encoded string to it, be > > it utf8 or whatever. > > It's the callers responsibility to decode their text into unicode from > > utf8 or whatever it is that they have it encoded as, and they can then > > encode the output into whatever encoding they want. Then I got a > > patch for removing BOM and integrated it without thinking, which > > required passing "encoding" to it. Looking at it now I realize that > > that was quite stupid. Since removeBOM() should never get encoded > > strings, should _assume_ that the input is unicode, so presumably it > > should suffice to have: > > > > def removeBOM(text, encoding): > > return text.lstrip(u'\ufeff') > > > > In fact, we should just get rid of this function and put > > text.lstrip(u'\ufeff') in the place where it is called. (BTW, should > > we put it back into the output?) > > > > Again, if you are using markdown as a module, you should decode your > > content yourself, run it through md.convert(), and then use the > > resulting unicode as you wish: > > > > input_file =3D codecs.open("test.txt", mode=3D"r", encoding=3D"utf= 16") > > text =3D input_file.read() > > html_unicode =3D Markdown.markdown(text, extensions) > > output_file =3D codecs.open("test.html", "w", encoding=3D"utf8") > > output_file.write(html_unicode) > > > > Perhaps we should raise an error if we get an encoded string? I.e., > > check that either the string is of type unicode _or_ it has no special > > characters. > > > > Markdown.markdown does have an obvious bug in that it accepts an > > encoding argument and doesn't pass it to Markdown.__init__. I suppose > > we should just get of this parameter altogether. > > > > There is also another utility function - markdownFromFile. This one > > does the encoding and decoding for you. For simplicity, it uses only > > one encoding argument, which is used for both decoding the input and > > encoding output. I suppose that this might be confusing. Should we > > add an extra argument "output_encoding" making it optional? I.e.: > > > > def markdownFromFile(input =3D None, > > output =3D None, > > extensions =3D [], > > encoding =3D None, > > output_encoding =3D None, > > message_threshold =3D CRITICAL, > > safe =3D False) : > > if not output_encoding: > > output encoding =3D encoding > > > > I must admit here that I just went to look at the documentation on the > > wiki and am realizing that that's what is responsible for much of the > > confusion. We have a new wiki at http://markdown.freewisdom.org/ and > > I am slowly moving content there. In particular, I copied over the > > content of http://markdown.freewisdom.org/Using_as_a_Module and > > updated it with the example above. > > > > We should perhaps create a page called "BOMs" to archive there the > > design decisions related to BOM removal, etc. > > > > - yuri > > > > > > On 10/30/07, Waylan Limberg <wa...@gm...> wrote: > > > Kent, thanks for the info. We'll look at this further. > > > > > > On 10/30/07, Kent Johnson <ke...@td...> wrote: > > > > Waylan Limberg wrote: > > > > > Kent, > > > > > > > > > > Could you verify that revision 46 fixes the problem for you? > > > > > > > > It will fix my problem but it won't work correctly with all unicode > > > > text. For example if the original text contains a BOM and it is > > > > converted with utf-16be or utf-16le encoding then the unicode strin= g > > > > still contains a BOM which will not be removed by this patch. > > > > > > My testing shows this works with utf-16. Could you provide a simple t= est case? > > > > > > > > > > > Also it still seems a bit strange that the encoding argument to > > > > markdown() is not used at all and the encoding argument to > > > > Markdown.__init__() is the encoding that the data was in *before* i= t was > > > > converted to unicode. > > > > > > > > I would write removeBOM() as > > > > > > > > def removeBOM(text, encoding): > > > > if isinstance(text, unicode): > > > > boms =3D [u'\ufeff'] > > > > else: > > > > boms =3D BOMS[encoding] > > > > for bom in boms: > > > > if text.startswith(bom): > > > > return text.lstrip(bom) > > > > return text > > > > > > > > and I would change the rest of the code to use encoding=3DNone when= the > > > > text is actually unicode. > > > > > > > > Kent > > > > > > > > > > > > > > We can thank the very smart Malcolm Tredinnick for providing a pa= tch. > > > > > See bug report [1817528] for more. > > > > > > > > > > On 9/12/07, Kent Johnson <ke...@td...> wrote: > > > > >> Hi, > > > > >> > > > > >> Markdown 1.6b doesn't work with UTF-8-encoded text. It fails wit= h a > > > > >> UnicodeDecodeError in removeBOM(): > > > > >> > > > > >> In [3]: import markdown > > > > >> In [4]: text =3D u'\xe2'.encode('utf-8') > > > > >> In [6]: print text > > > > >> =E2 > > > > >> In [7]: print markdown.markdown(text) > > > > >> ------------------------------------------------------------ > > > > >> Traceback (most recent call last): > > > > >> File "<ipython console>", line 1, in <module> > > > > >> File > > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5= /site-packages/markdown.py", > > > > >> line 1722, in markdown > > > > >> return md.convert(text) > > > > >> File > > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5= /site-packages/markdown.py", > > > > >> line 1614, in convert > > > > >> self.source =3D removeBOM(self.source, self.encoding) > > > > >> File > > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5= /site-packages/markdown.py", > > > > >> line 74, in removeBOM > > > > >> if text.startswith(bom): > > > > >> <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't deco= de byte > > > > >> 0xc3 in position 0: ordinal not in range(128) > > > > >> > > > > >> The problem is that the BOM being tested is unicode so to execut= e > > > > >> text.startswith(bom) > > > > >> Python tries to convert text to Unicode using the default encodi= ng > > > > >> (ascii). This fails because the text is not ascii. > > > > >> > > > > >> I'm trying to understand what the encoding parameter is for; it = doesn't > > > > >> seem to do much. There also seems to be some confusion with the = use of > > > > >> encoding in markdownFromFile() vs markdown(); the file is conver= ted to > > > > >> Unicode on input so I don't understand why the same encoding par= ameter > > > > >> is passed to markdown()? > > > > >> > > > > >> ISTM the encoding passed to markdown should match the encoding o= f the > > > > >> text passed to markdown, and the values in the BOMS table should= be in > > > > >> the encoding of the key, not in unicode. Then the __unicode__() = method > > > > >> should actually decode. Or is the intent that the text passed to > > > > >> markdown() should always be ascii or unicode? > > > > >> > > > > >> I can put together a patch if you like but I wanted to make sure= that I > > > > >> am not missing some grand plan... > > > > >> > > > > >> Kent > > > > >> > > > > >> ----------------------------------------------------------------= --------- > > > > >> This SF.net email is sponsored by: Microsoft > > > > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > > > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > > >> _______________________________________________ > > > > >> Python-markdown-discuss mailing list > > > > >> Pyt...@li... > > > > >> https://lists.sourceforge.net/lists/listinfo/python-markdown-dis= cuss > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > ---- > > > Waylan Limberg > > > wa...@gm... > > > > > > ---------------------------------------------------------------------= ---- > > > This SF.net email is sponsored by: Splunk Inc. > > > Still grepping through log files to find problems? Stop. > > > Now Search log events and configuration files using AJAX and a browse= r. > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > _______________________________________________ > > > Python-markdown-discuss mailing list > > > Pyt...@li... > > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > > > > > -- > > Yuri Takhteyev > > Ph.D. Candidate, UC Berkeley School of Information > > http://takhteyev.org/, http://www.freewisdom.org/ > > > > > -- > ---- > Waylan Limberg > wa...@gm... > --=20 ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2007-10-30 20:01:43
|
On 10/30/07, Yuri Takhteyev <qar...@gm...> wrote: > I haven't had a chance to look at the specific problem, but in > general, here how it is _supposed_ to work. Ahh, well that clears a few things up for me. Thanks for the explaination. Now for my proposal: Lets leave md.convert the way it is. The user has to convert to unicode first and then get unicode back in his code which he can do with as he pleases. However, I see markdown.markdown() as a shortcut for the common case. So maybe we could add some basic encoding/decoding for common cases. The user must pass in the encoding (and perhaps an optional output encoding) and, assuming the encoding actually matches (the users responsability - otherwise it fails gracefully) things work fine. If the user has a situation that doesn't fit the common case, then we would expect that the encoding/decoding will be done manually with md.convert. As long as the differances are clearly documented, that should work fine. Of course, the more I think about this, the more is feels like extra work I don't want to do. Any input? Obviously, markdownFromFile is a differant animal and should work as you proposed. > > The Markdown class is unicode-in-unicode-out. It can take a simple > string as input, but one should never pass an encoded string to it, be > it utf8 or whatever. > It's the callers responsibility to decode their text into unicode from > utf8 or whatever it is that they have it encoded as, and they can then > encode the output into whatever encoding they want. Then I got a > patch for removing BOM and integrated it without thinking, which > required passing "encoding" to it. Looking at it now I realize that > that was quite stupid. Since removeBOM() should never get encoded > strings, should _assume_ that the input is unicode, so presumably it > should suffice to have: > > def removeBOM(text, encoding): > return text.lstrip(u'\ufeff') > > In fact, we should just get rid of this function and put > text.lstrip(u'\ufeff') in the place where it is called. (BTW, should > we put it back into the output?) > > Again, if you are using markdown as a module, you should decode your > content yourself, run it through md.convert(), and then use the > resulting unicode as you wish: > > input_file =3D codecs.open("test.txt", mode=3D"r", encoding=3D"utf16= ") > text =3D input_file.read() > html_unicode =3D Markdown.markdown(text, extensions) > output_file =3D codecs.open("test.html", "w", encoding=3D"utf8") > output_file.write(html_unicode) > > Perhaps we should raise an error if we get an encoded string? I.e., > check that either the string is of type unicode _or_ it has no special > characters. > > Markdown.markdown does have an obvious bug in that it accepts an > encoding argument and doesn't pass it to Markdown.__init__. I suppose > we should just get of this parameter altogether. > > There is also another utility function - markdownFromFile. This one > does the encoding and decoding for you. For simplicity, it uses only > one encoding argument, which is used for both decoding the input and > encoding output. I suppose that this might be confusing. Should we > add an extra argument "output_encoding" making it optional? I.e.: > > def markdownFromFile(input =3D None, > output =3D None, > extensions =3D [], > encoding =3D None, > output_encoding =3D None, > message_threshold =3D CRITICAL, > safe =3D False) : > if not output_encoding: > output encoding =3D encoding > > I must admit here that I just went to look at the documentation on the > wiki and am realizing that that's what is responsible for much of the > confusion. We have a new wiki at http://markdown.freewisdom.org/ and > I am slowly moving content there. In particular, I copied over the > content of http://markdown.freewisdom.org/Using_as_a_Module and > updated it with the example above. > > We should perhaps create a page called "BOMs" to archive there the > design decisions related to BOM removal, etc. > > - yuri > > > On 10/30/07, Waylan Limberg <wa...@gm...> wrote: > > Kent, thanks for the info. We'll look at this further. > > > > On 10/30/07, Kent Johnson <ke...@td...> wrote: > > > Waylan Limberg wrote: > > > > Kent, > > > > > > > > Could you verify that revision 46 fixes the problem for you? > > > > > > It will fix my problem but it won't work correctly with all unicode > > > text. For example if the original text contains a BOM and it is > > > converted with utf-16be or utf-16le encoding then the unicode string > > > still contains a BOM which will not be removed by this patch. > > > > My testing shows this works with utf-16. Could you provide a simple tes= t case? > > > > > > > > Also it still seems a bit strange that the encoding argument to > > > markdown() is not used at all and the encoding argument to > > > Markdown.__init__() is the encoding that the data was in *before* it = was > > > converted to unicode. > > > > > > I would write removeBOM() as > > > > > > def removeBOM(text, encoding): > > > if isinstance(text, unicode): > > > boms =3D [u'\ufeff'] > > > else: > > > boms =3D BOMS[encoding] > > > for bom in boms: > > > if text.startswith(bom): > > > return text.lstrip(bom) > > > return text > > > > > > and I would change the rest of the code to use encoding=3DNone when t= he > > > text is actually unicode. > > > > > > Kent > > > > > > > > > > > We can thank the very smart Malcolm Tredinnick for providing a patc= h. > > > > See bug report [1817528] for more. > > > > > > > > On 9/12/07, Kent Johnson <ke...@td...> wrote: > > > >> Hi, > > > >> > > > >> Markdown 1.6b doesn't work with UTF-8-encoded text. It fails with = a > > > >> UnicodeDecodeError in removeBOM(): > > > >> > > > >> In [3]: import markdown > > > >> In [4]: text =3D u'\xe2'.encode('utf-8') > > > >> In [6]: print text > > > >> =E2 > > > >> In [7]: print markdown.markdown(text) > > > >> ------------------------------------------------------------ > > > >> Traceback (most recent call last): > > > >> File "<ipython console>", line 1, in <module> > > > >> File > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/s= ite-packages/markdown.py", > > > >> line 1722, in markdown > > > >> return md.convert(text) > > > >> File > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/s= ite-packages/markdown.py", > > > >> line 1614, in convert > > > >> self.source =3D removeBOM(self.source, self.encoding) > > > >> File > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/s= ite-packages/markdown.py", > > > >> line 74, in removeBOM > > > >> if text.startswith(bom): > > > >> <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode= byte > > > >> 0xc3 in position 0: ordinal not in range(128) > > > >> > > > >> The problem is that the BOM being tested is unicode so to execute > > > >> text.startswith(bom) > > > >> Python tries to convert text to Unicode using the default encoding > > > >> (ascii). This fails because the text is not ascii. > > > >> > > > >> I'm trying to understand what the encoding parameter is for; it do= esn't > > > >> seem to do much. There also seems to be some confusion with the us= e of > > > >> encoding in markdownFromFile() vs markdown(); the file is converte= d to > > > >> Unicode on input so I don't understand why the same encoding param= eter > > > >> is passed to markdown()? > > > >> > > > >> ISTM the encoding passed to markdown should match the encoding of = the > > > >> text passed to markdown, and the values in the BOMS table should b= e in > > > >> the encoding of the key, not in unicode. Then the __unicode__() me= thod > > > >> should actually decode. Or is the intent that the text passed to > > > >> markdown() should always be ascii or unicode? > > > >> > > > >> I can put together a patch if you like but I wanted to make sure t= hat I > > > >> am not missing some grand plan... > > > >> > > > >> Kent > > > >> > > > >> ------------------------------------------------------------------= ------- > > > >> This SF.net email is sponsored by: Microsoft > > > >> Defy all challenges. Microsoft(R) Visual Studio 2005. > > > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > > >> _______________________________________________ > > > >> Python-markdown-discuss mailing list > > > >> Pyt...@li... > > > >> https://lists.sourceforge.net/lists/listinfo/python-markdown-discu= ss > > > >> > > > > > > > > > > > > > > > > > > > > -- > > ---- > > Waylan Limberg > > wa...@gm... > > > > -----------------------------------------------------------------------= -- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Python-markdown-discuss mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > > > > > -- > Yuri Takhteyev > Ph.D. Candidate, UC Berkeley School of Information > http://takhteyev.org/, http://www.freewisdom.org/ > --=20 ---- Waylan Limberg wa...@gm... |
From: Kent J. <ke...@td...> - 2007-10-30 19:53:18
|
Yuri Takhteyev wrote: > The Markdown class is unicode-in-unicode-out. It can take a simple > string as input, but one should never pass an encoded string to it, be > it utf8 or whatever. > Since removeBOM() should never get encoded > strings, should _assume_ that the input is unicode, so presumably it > should suffice to have: > > def removeBOM(text, encoding): > return text.lstrip(u'\ufeff') Sounds good to me. > In fact, we should just get rid of this function and put > text.lstrip(u'\ufeff') in the place where it is called. (BTW, should > we put it back into the output?) Yes, and get rid of the encoding parameter to markdown() and Markdown.__init__() which then will not be used at all. That will reduce the confusion; as the code is written, it is not at all clear that it expects unicode text only (e.g. the comment mentions "The character encoding of <text>" which has no meaning if <text> is unicode). > Perhaps we should raise an error if we get an encoded string? I.e., > check that either the string is of type unicode _or_ it has no special > characters. Easy to do - just put self.source = unicode(source) in Markdown.__init__() > Markdown.markdown does have an obvious bug in that it accepts an > encoding argument and doesn't pass it to Markdown.__init__. I suppose > we should just get of this parameter altogether. Yes please! Kent |