Re: [Python-markdown-discuss] Use case Question

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

The primary (public-facing) difference between Mistune and Python-Markdown is that the Mistune parser outputs an AST which then needs to get fed to a renderer (of course a HTML renderer is conveniently provided), while Python-Markdown only ever outputs the rendered HTML. Because of that difference, using aPython-Markdon extension will likely result in you getting a less than complete document.

For example, you could subclass (or monkey patch) the `Markdown` class and replace the default serializer (which converts the ElementTree object to an HTML string). However, after the serializer runs, a number of post processors run which complete the document. In fact, one of those postprocessors specifically replaces all the “raw HTML placeholders” with the raw HTML content. That is relevant to you because you are using fenced code blocks. The fenced code extension runs as a preprocessor which removes all fenced code blocks from the document, wraps them in HTML `<pre><code>` tags, and then stores then in the raw HTML stash. In other words, the fenced code blocks aren’t even in the document until after the post processors run (they are held in the raw HTML stash with their location in the document maintained by placeholder strings).

Python-Markdown expects that the entire process runs to get the entire document. Therefore, if you want to use Python-Markdown, I would suggest taking the HTML output, feeding it into an HTML parser and extracting your JSON from that. In fact, your use case is exactly the sort of thing Mistuune’s parser/render structure is designed for. I would rather use Mistune for that (builting my own custom renderer), and I’m the author of Python-Markdown. Of course, I would miss some of Python-Markdown’s extensions…

We have discussed a refactor of Python-Markdown to better fit that use case, but that would be a lot of work for little gain and I just don’t have the time to work on it. Therefore, this much older lib continues to work only as a Markdown to HTML tool.

Waylan Limberg

> On Dec 20, 2017, at 4:24 PM, Tommy Falgout <to...@la...> wrote:
> 
> Dave,
> 
> My goal is to convert Markdown into a custom JSON schema.   For example, if I had a file of:
> 
> # Prerequisites
> prereq.md <http://prereq.md/>
> prereq-2.md <http://prereq-2.md/>
> 
> # Do stuff here
> 
> ```shell
> echo foo
> echo bar
> ```
> # Do more stuff here
> ```shell
> echo baz
> ```
> # Results
> The only thing that makes it a result is the code type is result.
> We assume the result is for the last command of the last code block
> ```result
> baz
> ```
> 
> 
> An expected output from parsing the input doc would be:
> {
>             'prerequisites': ['prereq.md <http://prereq.md/>', 'prereq-2.md <http://prereq-2.md/>'],
>             'commands': [
>                 { 'command': 'echo foo' },
>                 { 'command': 'echo bar' },
>                 { 'command': 'echo baz', 'expected_result': 'baz' } ]
> }
> 
> 
> Essentially, it takes everything inside the code block and creates a list of commands to run.  (Some of the commands would have results to compare against.  For example, "echo baz" should see a result of "baz").  Prerequisites are other files that it should run before the commands.
> 
> My goal is to parse a markdown file and create a JSON document that contains the list of commands to execute, prerequisite files, etc.
> 
>  Am I going down the right path with building my own extension? If so, any pointers?
> 
> 
> 
> On Wed, Dec 20, 2017 at 12:17 PM Dave Pawson <dav...@gm... <mailto:dav...@gm...>> wrote:
> Which input / which output is not exactly clear - no ID values in Python?
> 
> General answer though. md has both block and inline constructs
> so it would seem to meet json needs?
> 
> Perhaps try it as an extension first? one block, one inline
> and see how easy it is to work?
> 
> I found it very logical.
> YMMV
> 
> 
> On 20 December 2017 at 17:44, Tommy Falgout <to...@la... <mailto:to...@la...>> wrote:
> > Hello,
> >
> > I'm building a tool to convert markdown into runnable documentation and I
> > came across your project.  I'm currently using Mistune's BlockParser,
> > because I don't want to convert to HTML, instead into a custom JSON format.
> >
> > I looked at your documentation for writing an extension; however, I was
> > unsure if this would be the right path for me.  Can you please advise?  If
> > so, would I want to use a Preprocessor/Postprocessor/InlinePattern/etc.?
> >
> > Here's an example of an input document:
> > https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L25 <https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L25>
> >
> > Here's the expected output:
> > https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L57 <https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L57>
> >
> > Thanks,
> > -Tommy.
> >
> > ------------------------------------------------------------------------------
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot>
> > _______________________________________________
> > Python-markdown-discuss mailing list
> > Pyt...@li... <mailto:Pyt...@li...>
> > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss <https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss>
> >
> 
> 
> 
> --
> Dave Pawson
> XSLT XSL-FO FAQ.
> Docbook FAQ.
> http://www.dpawson.co.uk <http://www.dpawson.co.uk/>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot_______________________________________________
> Python-markdown-discuss mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss