From: Waylan L. <wa...@gm...> - 2008-11-11 16:42:26
Attachments:
block.py
|
On Mon, Nov 10, 2008 at 9:23 AM, Waylan Limberg <wa...@gm...> wrote: > On Mon, Nov 10, 2008 at 4:37 AM, Yuri Takhteyev <qar...@gm...> wrote: >> Perhaps we can find a way to make the parser a little more flexible, >> so that def_list extension will just need to add a hander to it, >> rather than having to swap in a new parser. > > Actually, over the last few days I've put together a completely new > core processor which works very differently. Like the other pluggable > parts of markdown, it loops through a list (OrderedDict actually) of > BlockProcessors and parses the source text one block at a time. An > extension could easily add, remove, replace individual pieces of the > parser without the problems we currently have. Additionally, it runs > faster (by about 1 second for 1000 iterations) than the current code > and uses less recursion. It also overcomes some of the parsing > differances unique to the python implementation (i.e.: <p>s in lists > match pl and php behavior) although an extension could easily provide > the current behavior instead. > Well, I'm satisfied enough with what I have to release it into the wild. I'm not convinced it's ready for prime time just yet, but I think it's mostly there. I started this as an extension that replaced the core parser rather than hacking on the existing code in markdown.py. For one, it made timing against the old code easier, and two, with none of the old code in front of me, it forced me to start from scratch. Unfortunately, after working out the remaining edge cases, it is barely any faster than the old code, but it is much more customizable, so we do gain a lot IMO. Please note that I did not even try to match the old core's behavior. I built this to match pl and php's behavior when parsing blocks (which means ticket 1 can finally be closed). I suspect a number of tests will need to be reviewed and updated once I copy this over into markdown.py. The extension is attached as block.py. Copy it to your markdown_extensions dir and run markdown with the 'block' extension for testing. Any and all feedback is welcome. Unless I hear some strong objections, I'll start migrating this code to markdown.py and update the tests and extensions in a branch. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-11-11 16:55:28
|
On Tue, Nov 11, 2008 at 11:42 AM, Waylan Limberg <wa...@gm...> wrote: > > Well, I'm satisfied enough with what I have to release it into the > wild. I'm not convinced it's ready for prime time just yet, but I > think it's mostly there. I started this as an extension that replaced > the core parser rather than hacking on the existing code in > markdown.py. For one, it made timing against the old code easier, and > two, with none of the old code in front of me, it forced me to start > from scratch. Unfortunately, after working out the remaining edge > cases, it is barely any faster than the old code, but it is much more > customizable, so we do gain a lot IMO. > Oh, one thing I forgot to mention is that I have disabled the header and hr preprocessors. The new core parser handles that stuff itself and does so faster than the preprocessors did. A small, but noteworthy detail I think. -- ---- Waylan Limberg wa...@gm... |
From: Yuri T. <qar...@gm...> - 2008-11-13 09:27:02
|
> Well, I'm satisfied enough with what I have to release it into the > wild. I'm not convinced it's ready for prime time just yet, but I > think it's mostly there. I started this as an extension that replaced > the core parser rather than hacking on the existing code in > markdown.py. For one, it made timing against the old code easier, and > two, with none of the old code in front of me, it forced me to start > from scratch. Unfortunately, after working out the remaining edge > cases, it is barely any faster than the old code, but it is much more > customizable, so we do gain a lot IMO. Yes, this certainly looks more customizable and is also easier to understand. Good job! If we can switch to this implementation without breaking too many tests, then let's just do this. A few small details: 1. Now that this is fresh in your head, please add comments on how this works. 2. I would get rid of all abbreviations. E.g., I would make it "SetexHeaderProcessor" instead of "SHeaderProcessor". 3. I would replace 4 with TAB_LENGTH everywhere. - yuri -- http://sputnik.freewisdom.org/ |
From: Waylan L. <wa...@gm...> - 2008-11-13 13:58:14
|
On Thu, Nov 13, 2008 at 4:26 AM, Yuri Takhteyev <qar...@gm...> wrote: >> Well, I'm satisfied enough with what I have to release it into the >> wild. I'm not convinced it's ready for prime time just yet, but I >> think it's mostly there. I started this as an extension that replaced >> the core parser rather than hacking on the existing code in >> markdown.py. For one, it made timing against the old code easier, and >> two, with none of the old code in front of me, it forced me to start >> from scratch. Unfortunately, after working out the remaining edge >> cases, it is barely any faster than the old code, but it is much more >> customizable, so we do gain a lot IMO. > > Yes, this certainly looks more customizable and is also easier to > understand. Good job! If we can switch to this implementation > without breaking too many tests, then let's just do this. Thanks. I crunched on the bugs last night and only have one failing test left. I'll push once I get a a few implementation details worked out. Speaking of which, would you suggest leaving the processors in the parser class, or should they be defined in the Markdown class and passed into the parser on init? The latter seems to provide a more consistent api for extension authors to me. Any input in that respect would be appreciated. > > A few small details: > > 1. Now that this is fresh in your head, please add comments on how this works. > 2. I would get rid of all abbreviations. E.g., I would make it > "SetexHeaderProcessor" instead of "SHeaderProcessor". > 3. I would replace 4 with TAB_LENGTH everywhere. > All good points. Will do. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-11-14 04:44:33
|
On Thu, Nov 13, 2008 at 8:58 AM, Waylan Limberg <wa...@gm...> wrote: > On Thu, Nov 13, 2008 at 4:26 AM, Yuri Takhteyev <qar...@gm...> wrote: >> >> Yes, this certainly looks more customizable and is also easier to >> understand. Good job! If we can switch to this implementation >> without breaking too many tests, then let's just do this. > > Thanks. I crunched on the bugs last night and only have one failing > test left. I'll push once I get a a few implementation details worked > out. > FYI, I just pushed this. The only failing test is for definition lists as I haven't touched that yet. Everything else is passing and we *should* be more similar to perl and php implementations. Enjoy! I just marked tickets 1 & 21 as fixed. We now have no open tickets except 8 which is a nice-to-have. Although, we already do it better than everyone else, so it doesn't really matter. -- ---- Waylan Limberg wa...@gm... |
From: Waylan L. <wa...@gm...> - 2008-11-15 02:27:55
|
In case anyone is interested, I just pushed changes to docs/writing_extensions.txt [1] for the new BlockParser (starting at line 165). The code is documented as well, but sometimes the overview text is easier to start with. Any corrections, additions, criticisms or other feedback is welcome. [1]: http://gitorious.org/projects/python-markdown/repos/mainline/blobs/master/docs%2Fwriting_extensions.txt#line165 On Thu, Nov 13, 2008 at 11:44 PM, Waylan Limberg <wa...@gm...> wrote: > On Thu, Nov 13, 2008 at 8:58 AM, Waylan Limberg <wa...@gm...> wrote: >> On Thu, Nov 13, 2008 at 4:26 AM, Yuri Takhteyev <qar...@gm...> wrote: >>> >>> Yes, this certainly looks more customizable and is also easier to >>> understand. Good job! If we can switch to this implementation >>> without breaking too many tests, then let's just do this. >> >> Thanks. I crunched on the bugs last night and only have one failing >> test left. I'll push once I get a a few implementation details worked >> out. >> > > FYI, I just pushed this. The only failing test is for definition lists > as I haven't touched that yet. Everything else is passing and we > *should* be more similar to perl and php implementations. Enjoy! > > I just marked tickets 1 & 21 as fixed. We now have no open tickets > except 8 which is a nice-to-have. Although, we already do it better > than everyone else, so it doesn't really matter. > > > > -- > ---- > Waylan Limberg > wa...@gm... > -- ---- Waylan Limberg wa...@gm... |