You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(14) |
Aug
(5) |
Sep
|
Oct
|
Nov
|
Dec
(3) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(7) |
Apr
(6) |
May
(25) |
Jun
(11) |
Jul
|
Aug
(5) |
Sep
(5) |
Oct
(39) |
Nov
(28) |
Dec
(6) |
2008 |
Jan
(4) |
Feb
(39) |
Mar
(14) |
Apr
(12) |
May
(14) |
Jun
(20) |
Jul
(60) |
Aug
(69) |
Sep
(20) |
Oct
(56) |
Nov
(41) |
Dec
(29) |
2009 |
Jan
(27) |
Feb
(21) |
Mar
(37) |
Apr
(18) |
May
(2) |
Jun
(6) |
Jul
(6) |
Aug
(5) |
Sep
(2) |
Oct
(12) |
Nov
(2) |
Dec
|
2010 |
Jan
(12) |
Feb
(13) |
Mar
(10) |
Apr
|
May
(6) |
Jun
(5) |
Jul
(10) |
Aug
(7) |
Sep
(8) |
Oct
(7) |
Nov
(1) |
Dec
|
2011 |
Jan
|
Feb
|
Mar
(6) |
Apr
(5) |
May
(6) |
Jun
(15) |
Jul
(2) |
Aug
(6) |
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(5) |
2012 |
Jan
(6) |
Feb
|
Mar
(2) |
Apr
(2) |
May
(2) |
Jun
(1) |
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
(20) |
2013 |
Jan
|
Feb
|
Mar
(5) |
Apr
(1) |
May
(1) |
Jun
(9) |
Jul
(3) |
Aug
(5) |
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2014 |
Jan
(10) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
(12) |
Sep
(9) |
Oct
(4) |
Nov
(8) |
Dec
(2) |
2015 |
Jan
(5) |
Feb
(5) |
Mar
(1) |
Apr
(1) |
May
(3) |
Jun
|
Jul
|
Aug
(9) |
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(2) |
Feb
(2) |
Mar
(9) |
Apr
(2) |
May
(6) |
Jun
|
Jul
|
Aug
(1) |
Sep
(7) |
Oct
(1) |
Nov
|
Dec
(1) |
2017 |
Jan
(9) |
Feb
|
Mar
(3) |
Apr
|
May
(14) |
Jun
|
Jul
(2) |
Aug
(1) |
Sep
|
Oct
|
Nov
(2) |
Dec
(5) |
2018 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
(9) |
2019 |
Jan
(4) |
Feb
(1) |
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
From: Waylan L. <wa...@gm...> - 2008-07-18 19:25:58
|
On Fri, Jul 18, 2008 at 3:00 PM, Yuri Takhteyev <qar...@gm...> wrote: >> I will appreciate any and all feedback. > > Great job! Thanks for finding the time! > >> Yuri, is there a way to have the "Extension" tab rather than the >> "Overview" tab highlighted at the top of the page when viewing the >> pages for the individual extensions? > > Yes. For pages that are included in the menu bar themselves this > happens automatically. For other pages, you can specify which second > level menu item should be activated by setting "Category" field. In > this case, set "Category" to "Available_Extensions". To set > category, click on "Advanced Options" while editing the page. Thanks, That did it. I've updated all the pages now. -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-07-18 19:00:11
|
> I will appreciate any and all feedback. Great job! Thanks for finding the time! > Yuri, is there a way to have the "Extension" tab rather than the > "Overview" tab highlighted at the top of the page when viewing the > pages for the individual extensions? Yes. For pages that are included in the menu bar themselves this happens automatically. For other pages, you can specify which second level menu item should be activated by setting "Category" field. In this case, set "Category" to "Available_Extensions". To set category, click on "Advanced Options" while editing the page. E.g.: http://www.freewisdom.org/projects/python-markdown/Footnotes - yuri -- http://sputnik.freewisdom.org/ |
From: John S. <jo...@sz...> - 2008-07-18 09:24:37
|
Thanks Artem! I would have responded sooner, but I'm not subscribed to the list. That should be fixed now. :-) -John |
From: Yuri T. <qar...@gm...> - 2008-07-17 23:30:07
|
> Something quite similar to this was checked in [2] a few months back. > I considered doing exactly as you suggested, but it seemed a little > too restrictive so I used pythons url parser to leave a little more > flexibility. In any event, it is only available in safe_mode. See the > docstring in the patch for an explanation. Oh, indeed: $ cat > test.txt [foo][alert] [alert]: javascript:alert(42) $ python markdown.py -s remove test.txt <p><a href="">foo</a> Perhaps what we need is the documentation... > I'm not completely convinced it covers every possibility. Actually as > http://ha.ckers.org/xss.html points out, there very well may be as yet > undiscovered possibilities that we don't know to check for. Yes, I don't think this is safe - it assumes the behavior of a standards-complient browser, but won't prevent some XSS attacks that target IE6. I think being both flexible and secure is a balancing act that is best left for a good XSS filter. I don't think it's our job to write one. (Is there a good one for python that we can just recommend for people to use?) To the extent that we implement a "safe" mode, I think we should go for the most restrictive approach. If we are not (pretty much) sure that it's safe, through it out in safe mode. For me, this means the URL should start with "http://", "https://", "/" or "#". - yuri -- http://sputnik.freewisdom.org/ |
From: Yuri T. <qar...@gm...> - 2008-07-17 23:08:52
|
> I have markdown installed with the easy_install command directly from the > cheese shop, though I think I have a recent version (1.7). But while trying > out your examples above, my installation fails: > > >>> import markdown > >>> markdown.markdown(r'[foo](bar())') > u'<p><a href="bar(">foo</a>)\n</p>' > >>> markdown.markdown(r'[foo](bar(\))') > u'<p>[foo](bar())\n</p>' > > Does babelmark have anathor version installed? They say that they also have > 1.7 I think you have the right version of Python markdown but the wrong (old) version of Perl. [foo](bar()) => <p><a href="bar(">foo</a>)\n</p> This is the "bug" that Waylan mentioned. That is, we differ from other implementations here, in that <p><a href="bar()">foo</a></p>, would be more desirable. [foo](bar(\)) => <p>[foo](bar())\n</p> Apart from the \n, this is what all implementations do now. _Old_ version (1.0.1) of Perl markdown actually generated the link. The new one (1.0.2b8) does not. See: http://babelmark.bobtfish.net/?markdown=[foo](bar(\)) PHP Markdown, which for me is a more important reference, produces this: <p>[foo](bar())</p> Not to suggest that there is any logic to any of those behaviors. Markdown syntax clearly doesn't consider parentheses in inline URLs. So, while it would be nice to be on the same page with other implementations for the first example, I can't say this bothers me too much. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-07-17 22:26:11
|
First of all, the parentheses in links issue is a known bug with an existing ticket[1]. A patch is most welcome. That said, Yuri pointed out a few ways to work around that limitation. [1]: http://www.freewisdom.org/projects/python-markdown/Tickets/000004 On Thu, Jul 17, 2008 at 5:45 PM, Yuri Takhteyev <qar...@gm...> wrote: > > Now, given that we already have a "safe" option that filters out > user's HTML, I would be open to also stripping out (in "safe" mode) > any links that do not start with one of a small number of prefixes > known to be (relatively) safe (e.g., "/", "#", "http://", "https://", > "mailto://"). However, this would only make sense in "safe" mode, > when user-supplied HTML is already being removed. > Something quite similar to this was checked in [2] a few months back. I considered doing exactly as you suggested, but it seemed a little too restrictive so I used pythons url parser to leave a little more flexibility. In any event, it is only available in safe_mode. See the docstring in the patch for an explanation. I'm not completely convinced it covers every possibility. Actually as http://ha.ckers.org/xss.html points out, there very well may be as yet undiscovered possibilities that we don't know to check for. In any event, for anyone that cares about this issue, that is an interesting read. If anyone has any improvements and/or suggestions, I'm open. [2]: http://gitorious.org/projects/python-markdown/repos/mainline/commits/2db5d1c8e469d2943a6a851bc0ff3ede070e448b -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-07-17 21:44:58
|
First to Gregor: For [foo](bar(\)) Python Markdown actually behaves just like the most recent Perl implementation. http://babelmark.bobtfish.net/?markdown=[foo](bar(\))%0D%0A For [foo](bar()), Python Markdown gives you different, and arguably less intelligent HTML than other implementations: http://babelmark.bobtfish.net/?markdown=[foo](bar())%0D%0A However, other implementations only treat URLs with parenthesis intelligently if the parentheses are balanced, and there is a simple alternative way to link to URLs that have parentheses in them, which is supported by all implementations: [foo][bar] [bar]: http://localhost/bar().html (see http://babelmark.bobtfish.net/?markdown=[foo][bar]%0D%0A%0D%0A[bar]%3A+http%3A%2F%2Flocalhost%2Fbar().html%0D%0A) Yes, this works for Javascript too: [foo][alert] [alert]: javascript:alert(42) http://babelmark.bobtfish.net/?markdown=[foo][alert]%0D%0A%0D%0A[alert]%3A+javascript%3Aalert(42)%0D%0A Does this allow people to do nasty stuff? Yes. However, the consensus on the markdown-discuss list seems to be that preventing XSS attacks is not Markdown's job. The reason is that javascript:alert(42) is just the tip of the iceberg when it comes to cross-site scripting. If you are worried about cross-site scripting, you should get a good XSS filter and run markdown output through it. And in my opinion, the only way to do it right is to parse the output and filter it so that only stuff that you know is safe passes through. You can't fight XSS by black-listing a few keywords like "javascript". Now, given that we already have a "safe" option that filters out user's HTML, I would be open to also stripping out (in "safe" mode) any links that do not start with one of a small number of prefixes known to be (relatively) safe (e.g., "/", "#", "http://", "https://", "mailto://"). However, this would only make sense in "safe" mode, when user-supplied HTML is already being removed. To Blake: > http://maps.google.com/maps?f=q&hl=en&geocode=&q=Summerhill+and+MacLennan&sll=43.687177,-79.371672&sspn=0.021661,0.037594&ie=UTF8&t=h&z=16 > and watched it completely fail In this case, I don't actually see what the problem would be. It seems to work fine for me. - yuri -- http://sputnik.freewisdom.org/ |
From: Blake W. <bw...@la...> - 2008-07-17 20:40:53
|
Gregor Müllegger wrote: > I tried to post the following snippet: > [What is the answer?](javascript:alert(42);) > but this would simply give me > <p><a href="javascript:alert(1">test</a>;)\n</p> > but this would also prevent users to post links to wikipedia like > "http://en.wikipedia.org/wiki/Phone_(disambiguation)". I got hit by the same thing this morning, when I tried to post a link to: http://maps.google.com/maps?f=q&hl=en&geocode=&q=Summerhill+and+MacLennan&sll=43.687177,-79.371672&sspn=0.021661,0.037594&ie=UTF8&t=h&z=16 and watched it completely fail. I ended up posting a link to http://tinyurl.com/69xc2l which is kind of the same thing, but not really. Later, Blake. |
From: G. M. <gr...@mu...> - 2008-07-17 20:31:01
|
Hello, i recently tried if i could hijack my own site to test if it is secure enough. I tried to post the following snippet: [What is the answer?](javascript:alert(42);) but this would simply give me <p><a href="javascript:alert(1">test</a>;)\n</p> Though i have discovered that this is achievable with the perl implementation of markdown with a backslash in front of the first ) It looks like that: [What is the answer?](javascript:alert(42\);) and it works! Ok it's cool that there cannot be any javascript with parantheses... but this would also prevent users to post links to wikipedia like "http://en.wikipedia.org/wiki/Phone_(disambiguation)". Is this a bug? Or a feature? Or a not quite well defined thing in the markdown specs? Thanks for your attention :-) Gregor |
From: Waylan L. <wa...@gm...> - 2008-07-15 18:34:53
|
I've spent some time over the last couple days improving (IMO) the documentation for the various extensions to Python-Markdown. I've completely revamped the Available_Extensions[1] page into three sections to more clearly define how they fit into the scheme of things, and for all extensions currently in the Git repo, I've added a page of documentation. I'm open to any criticisms, suggestions and/or corrections. Please feel free to just edit the pages yourself with spelling and grammar corrections. I will appreciate any and all feedback. Yuri, is there a way to have the "Extension" tab rather than the "Overview" tab highlighted at the top of the page when viewing the pages for the individual extensions? [1]: http://www.freewisdom.org/projects/python-markdown/Available_Extensions -- ---- Waylan Limberg wa...@gm... |
From: Artem Y. <ne...@gm...> - 2008-07-15 15:21:03
|
I reformatted test suite, fixed a lot of bugs in version with ElementTree, and now all the test are working. I changed hrs handling because in NanoDOM version top level hrs surrounded with p tags, and p tags was stripped out in toxml method. Now LinePreprocessor replaces all hrs declarations with "___", then I added Markdown._processHR method, and in Markdown._processSection we now also checking for hr. But I also was forced to add this check to Markdown._processParagraph. Maybe the simplest and faster way of fixing it is just plain replace all "<p><hr /></p>" with "<hr />" after serialization, but then, we won't get valid ElementTree. Concerning attributes({@id=1234}), that was handled by NanoDOM, I added global function handleAttributes(text, parent), because it's required in inline patterns(ImagePattern) and also in Markdown class. Now we processing attributes in Markdown._processTree, after applying inline patterns, but still in same cycle. New version is slower then previous because of these changes, but still faster then new version with NanoDOM. I also fixed ticket #5 [1] in GSoC etree branch. Changing order of inline patterns works, but then other tests will fail. I changed BACKTICK_RE, to r'[^\\]\`([^\`]*[^\\]{0,1})\`' , after that evrything works fine, except of striping last character before backtick, for instance, "test `test`" -> "test</code>test</code>" instead of "test </code>test</code>", it's because of negative expression([^\\]) at the begining of regexp, so I decided to add to Pattern class attribute contentGroup, representing number of group, that we'd like to replace, by default it equals to 2. And changed regexp to r'([^\\])\`([^\`]*[^\\]{0,1})\`', so now we should use group 3 instead of group 2, and we creating pattern in that way: BacktickPattern(BACKTICK_RE, 3), joined group 1 and group 2 will be the string to the left of the match. [1]: http://www.freewisdom.org/projects/python-markdown/Tickets/000005 |
From: Artem Y. <ne...@gm...> - 2008-07-10 23:58:56
|
Waylan Limberg wrote: > Artem, > > You might want to take a look at tickets 4 & 5 if you get a chance. > They both involve inline patterns. > > OK, I think I'll fix it in GSoC version. |
From: Waylan L. <wa...@gm...> - 2008-07-10 03:43:52
|
On Tue, Jul 8, 2008 at 7:17 PM, Yuri Takhteyev <qar...@gm...> wrote: >> If you turn on the feature, I'll copy the existing open reports over. > > http://www.freewisdom.org/projects/python-markdown/Tickets > > Once the bugs are copied over, we should close that tab in SF. Done. I also updated the link on the wiki here: http://www.freewisdom.org/projects/python-markdown/Reporting_Bugs. Perhaps the Ticket page should replace that page. While I was removing the bug tracker from SF (actually hiding it - you can't delete them so the data is still preserved and the admins can still get to it) I also hid the SVN repo as its no longer being used. The wiki had already been updated in that respect anyway. Artem, You might want to take a look at tickets 4 & 5 if you get a chance. They both involve inline patterns. I took the liberty of assigning ticket 2 to you as you already have a fix in your branch and noted as much in the ticket. Tickets 1 & 3 are ugly and should be left alone until the dust settles from the Gsoc work. -- ---- Waylan Limberg wa...@gm... |
From: Artem Y. <ne...@gm...> - 2008-07-09 19:31:55
|
John Szakmeister wrote: > I was in the process of converting some of my old blog posts, and ran > across this issue. Turns out, if you do either: *[]() some text* or > _[]() some text_, you end up with actual asterisks and underscores > rather than having thing emphasized. I first noticed it in an > unordered list, but it turns out the same problem happens in regular > text to. I've attached a sample text file that will reproduce the > problem. > > -John > It's because of InlinePatterns limitations. For now you can use: [*link*](http://example.com) some text This bug fixed in GSoC version of markdown - http://gitorious.org/projects/python-markdown/repos/gsoc2008 either of variants works: [*link*](http://example.com) some text *[link](http://example.com) some text* |
From: John S. <jo...@sz...> - 2008-07-09 19:19:02
|
I was in the process of converting some of my old blog posts, and ran across this issue. Turns out, if you do either: *[]() some text* or _[]() some text_, you end up with actual asterisks and underscores rather than having thing emphasized. I first noticed it in an unordered list, but it turns out the same problem happens in regular text to. I've attached a sample text file that will reproduce the problem. -John |
From: Waylan L. <wa...@gm...> - 2008-07-09 01:52:07
|
On Tue, Jul 8, 2008 at 9:00 PM, Artem Yunusov <ne...@gm...> wrote: > Waylan Limberg wrote: >> I then found lxml's htmldiff tool [1], which provided an easy >> (better??) way to compare html docs, but it still hung up on some (not >> all) whitespace. Additionally, it didn't exactly provide an easily >> readable output to display in the test output. If your interested, I >> can forward the code I have - that is, if I can find it. >> > > Yep, it would be interesting. > Hmm, all I can find is a very simple little script that uses xmldiff [1]. I doubt this is very useful, but I've attached it anyway. [1]: http://www.logilab.org/859 -- ---- Waylan Limberg wa...@gm... |
From: Artem Y. <ne...@gm...> - 2008-07-09 00:59:17
|
Waylan Limberg wrote: > I then found lxml's htmldiff tool [1], which provided an easy > (better??) way to compare html docs, but it still hung up on some (not > all) whitespace. Additionally, it didn't exactly provide an easily > readable output to display in the test output. If your interested, I > can forward the code I have - that is, if I can find it. > Yep, it would be interesting. > What I'd consider doing is actually taking the most recent markdown > with NanoDom and altering NanoDom's whitespace to match ET and run a > little script that loops through all the tests and outputs new > expected html files. It shouldn't be all that hard. > Yes, for now it seems reasonable solution. Also, ET don't have any output indentation, I wrote function that do some indentation for ET. Another one solution is to tune this function to match previous markdown output. I also tried to load data from tests html files to ET, and then serialize it, but there are some issues and I didn't succeeded in it. |
From: Artem Y. <ne...@gm...> - 2008-07-09 00:59:12
|
Yuri Takhteyev wrote: > We could re-think our choice of placeholders if we know that this is > the reason. But it sounds like elementTree is the way to go. > Yes, I agree. > A few minor things. The current version in git fails on non-ASCII > files (e.g., tests/misc/russian.txt). That's because we end up > encoding the content too early: line 1889 writes etree to xml, utf8 > encoded, after which we try to run textPostProcessors on it. That's > not good. This seems to fix it: > > xml = codecs.decode(etree.tostring(root, encoding="utf8"), "utf8") > Strange, I didn't notice it on my version. Thanks for the fix. > (I am assuming that standard etree doesn't have an option of > serializing to non-encoded unicode. If it does, use that instead.) > > Note that in my experience there is only one way to use Unicode right > with Python: assume that all strings are unicode. So, for this > reason, I've been following the policy of decoding data when it comes > my world and encoding it only when it comes out, without _ever_ > passing encoded strings around. Encoded strings are evil. > Thanks for advice. > Another thing: lots of tests seem to fail now because of whitespace > differences. I am guessing that the way to solve it is to first > extend test-markdown.py to add an option of reflowing XHTML before > diffing. Then, once we know that all tests pass except for white > space differences, we can change the expected output. > Maybe we should straight away worry about whitespace, because anyway we'll need to fix failed tests. |
From: Yuri T. <qar...@gm...> - 2008-07-08 23:17:21
|
> If you turn on the feature, I'll copy the existing open reports over. http://www.freewisdom.org/projects/python-markdown/Tickets I copied over one ticket so far: http://www.freewisdom.org/projects/python-markdown/Tickets/000001 Keep one thing in mind when copying the tickets: the description field is markdown, so HTML won't be expected automatically. Some of the bugs have HTML in them, so it needs to be escaped. Once the bugs are copied over, we should close that tab in SF. Feel free to suggest (simple) features. For now, here is an entertaining one: http://www.freewisdom.org/projects/python-markdown/Tickets/000001.raw will give you raw content of the ticket in Lua format. If anyone cares, I can add a JSON or XML output. (Or is there a format that is more python-friendly?) - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-07-08 17:52:55
|
On Tue, Jul 8, 2008 at 12:40 PM, Yuri Takhteyev <qar...@gm...> wrote: >> Is this still true if you have inline not-necessarily-legal-XML blocks? >> (i.e. will it still be easy to convert: >> **Foo** >> <br> >> blah blah blah >> 'bar' >> ?) > > I meant a simple RE-based substitution. Correct me if I am wrong, but > converting XHTML into HTML largely involves changing <$x/> and > <$x></$x> to to <$x> for certain values of $x. Yeah, that *should* cover the basics. Of course, anyone could always pass Markdown's output into uTidylib [1] or ElementTree Tidy [2] if they want a solid conversion. Unfortunately, it will likely slow things down to much to offer that option in Markdown directly. However, it may not be a bad idea to have an extension for those who want it. Hmm, now to get back on-subject - I wonder if either of those tools will do whitespace normalization only, without making any other changes to the output. It's worth exploring for the tests. [1]: http://utidylib.berlios.de/ [2]: http://effbot.org/zone/element-tidylib.htm > >> What if we went with the BOM character (oxFEFF) as the replacement? It's >> legal unicode, and _extremely_ unlikely to occur in the middle of text. The >> only thing to watch out for is having it occur at the start of the file. > > First, my original intention was to use not \u0001 and \u0002 but > rather \u0002 and \u0003 - "start of text" (STX) and "end of text" > (ETX). The nice thing about them is that they come as a pair - start > and end. Also, if we use BOM we'll have to worry about HTML, etc. > occuring in the beginning of the text. But this is an option to keep > in mind. Alternatively, we can look into the private ranges, though > then we have to make sure that our use does not conflict with possible > private uses by the caller. > > - yuri > > -- > http://sputnik.freewisdom.org/ > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-07-08 16:40:23
|
> Is this still true if you have inline not-necessarily-legal-XML blocks? > (i.e. will it still be easy to convert: > **Foo** > <br> > blah blah blah > 'bar' > ?) I meant a simple RE-based substitution. Correct me if I am wrong, but converting XHTML into HTML largely involves changing <$x/> and <$x></$x> to to <$x> for certain values of $x. > What if we went with the BOM character (oxFEFF) as the replacement? It's > legal unicode, and _extremely_ unlikely to occur in the middle of text. The > only thing to watch out for is having it occur at the start of the file. First, my original intention was to use not \u0001 and \u0002 but rather \u0002 and \u0003 - "start of text" (STX) and "end of text" (ETX). The nice thing about them is that they come as a pair - start and end. Also, if we use BOM we'll have to worry about HTML, etc. occuring in the beginning of the text. But this is an option to keep in mind. Alternatively, we can look into the private ranges, though then we have to make sure that our use does not conflict with possible private uses by the caller. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-07-08 14:03:24
|
On Tue, Jul 8, 2008 at 2:53 AM, Yuri Takhteyev <qar...@gm...> wrote: > > Let me know whether you prefer to stick with SF tracker for now or > switch. Or we can revisit this when the comment functionality is > there. Or feel free to suggest features. > Bug tracking is the one thing I'm the most eager to move away from SF, so I say switch. As it stands, there aren't any discussions going on on the SF tracker anyway. I seem to be the only one leaving any comments - which I can easily do in the wikipage format. If you turn on the feature, I'll copy the existing open reports over. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-07-08 13:52:13
|
On Tue, Jul 8, 2008 at 1:44 AM, Yuri Takhteyev <qar...@gm...> wrote: > But it sounds like elementTree is the way to go. > I agree. It doesn't appear that lxml adds any real value. Add in the trouble installing it, and I doubt many would ever use it. I'd say leave it out for now. If things improve in the future, it won't be that hard to add it back in. > Another thing: lots of tests seem to fail now because of whitespace > differences. I am guessing that the way to solve it is to first > extend test-markdown.py to add an option of reflowing XHTML before > diffing. Then, once we know that all tests pass except for white > space differences, we can change the expected output. Interestingly, I had started work on this at some point, but never got very far. My intended approach was to feed the output and expected output both into a x/html parser, normalize whitespace, and then diff the output of each. Thing is, I couldn't find a python tool that actually did that. Well, there always is BeautifulSoup, but that could very easily alter some of the html and hide bugs - defeating the purpose of testing. Considering that whitespace is insignificant in x/html and the number of x/html tools available in python, you'd think whitespace normalization would be a standard feature. Ah well. I thought about doing a simple whitespace normalization on the string using string.replace or re.sub. But then we'd lose all linebreaks so that the entire doc is on one line. That's kind of hard to diff. Looping through a dom and normalizing on each string was more than I wanted to do. I then found lxml's htmldiff tool [1], which provided an easy (better??) way to compare html docs, but it still hung up on some (not all) whitespace. Additionally, it didn't exactly provide an easily readable output to display in the test output. If your interested, I can forward the code I have - that is, if I can find it. What I'd consider doing is actually taking the most recent markdown with NanoDom and altering NanoDom's whitespace to match ET and run a little script that loops through all the tests and outputs new expected html files. It shouldn't be all that hard. [1]: http://codespeak.net/lxml/lxmlhtml.html#html-diff -- ---- Waylan Limberg wa...@gm... |
From: Blake W. <bw...@la...> - 2008-07-08 11:40:24
|
Yuri Takhteyev wrote: >> Concerning the html/xtml output, I discovered that this option supports >> only by new versions of ElementTree(1.3) and lxlm(2.0), so it won't be >> available for now on standard Python 2.5 ElementTree. Maybe we can do it >> optional. > Again, I wouldn't worry too much about this. If someone wants HTML > output, converting XHTML to HTML4 should be easy enough. Is this still true if you have inline not-necessarily-legal-XML blocks? (i.e. will it still be easy to convert: **Foo** <br> blah blah blah 'bar' ?) >> There is one problem with lxml: misc/boldlinks test cause such error: >> File "etree.pyx", line 693, in etree._Element.text.__set__ >> File "apihelpers.pxi", line 344, in etree._setNodeText >> File "apihelpers.pxi", line 648, in etree._utf8 >> AssertionError: All strings must be XML compatible, either Unicode or ASCII >> >> I suppose that is because in this test we trying to assign to el.text >> data, that contains placeholders, and maybe by some reason lxlm treats >> placeholders values(u'\u0001' and u'\u0002') as not unicode or ascii. > We could re-think our choice of placeholders if we know that this is > the reason. But it sounds like elementTree is the way to go. What if we went with the BOM character (oxFEFF) as the replacement? It's legal unicode, and _extremely_ unlikely to occur in the middle of text. The only thing to watch out for is having it occur at the start of the file. Later, Blake. |
From: Yuri T. <qar...@gm...> - 2008-07-08 06:53:10
|
I upgraded the wiki on the site (http://www.freewisdom.org/projects/python-markdown/) and locked the front page against edits by non-admin users because of an influx of spam. This upgrade also brings the option of using my own mini bug tracker, which is finally useable. It's still quite minimalistic, but arguably it's hard to do worse than the sourceforge one. If we decide to use it, it would be just like this one: http://sputnik.freewisdom.org/en/Tickets The main limitation is that you cannot attach comments to the ticket - instead, there is a single wiki page for the discussion. (Attaching comments is the next feature on the list, but I have some traveling in the next few weeks, so will take some time.) Let me know whether you prefer to stick with SF tracker for now or switch. Or we can revisit this when the comment functionality is there. Or feel free to suggest features. - yuri -- http://sputnik.freewisdom.org/ |