From: Tommy F. <to...@la...> - 2017-12-21 17:00:06
|
Waylan, Thank you for the detailed breakdown of Mistune/Python-Markdown. Parsing markdown to an AST is exactly the use case I’m looking for. The reason I started to research Python-Markdown is because Mistune didn’t meet all of my use cases and wasn’t parsing all markdown primitives as I expected. As you mentioned, parsing HTML into my desired JSON isn’t ideal for my scenario, so thanks for the clarity and saving me some investigation time. That said, I found another tool that advertises to do exactly what I’m looking for: https://github.com/miyuchina/mistletoe I’ll explore that, unless you have any hands-on experience with it. Cheers, -Tommy. From: Waylan Limberg <way...@ic...> Date: Wednesday, December 20, 2017 at 5:02 PM To: Tommy Falgout <to...@la...> Cc: Dave Pawson <dav...@gm...>, PythonMD list <pyt...@li...> Subject: Re: [Python-markdown-discuss] Use case Question The primary (public-facing) difference between Mistune and Python-Markdown is that the Mistune parser outputs an AST which then needs to get fed to a renderer (of course a HTML renderer is conveniently provided), while Python-Markdown only ever outputs the rendered HTML. Because of that difference, using aPython-Markdon extension will likely result in you getting a less than complete document. For example, you could subclass (or monkey patch) the `Markdown` class and replace the default serializer (which converts the ElementTree object to an HTML string). However, after the serializer runs, a number of post processors run which complete the document. In fact, one of those postprocessors specifically replaces all the “raw HTML placeholders” with the raw HTML content. That is relevant to you because you are using fenced code blocks. The fenced code extension runs as a preprocessor which removes all fenced code blocks from the document, wraps them in HTML `<pre><code>` tags, and then stores then in the raw HTML stash. In other words, the fenced code blocks aren’t even in the document until after the post processors run (they are held in the raw HTML stash with their location in the document maintained by placeholder strings). Python-Markdown expects that the entire process runs to get the entire document. Therefore, if you want to use Python-Markdown, I would suggest taking the HTML output, feeding it into an HTML parser and extracting your JSON from that. In fact, your use case is exactly the sort of thing Mistuune’s parser/render structure is designed for. I would rather use Mistune for that (builting my own custom renderer), and I’m the author of Python-Markdown. Of course, I would miss some of Python-Markdown’s extensions… We have discussed a refactor of Python-Markdown to better fit that use case, but that would be a lot of work for little gain and I just don’t have the time to work on it. Therefore, this much older lib continues to work only as a Markdown to HTML tool. Waylan Limberg On Dec 20, 2017, at 4:24 PM, Tommy Falgout <to...@la...<mailto:to...@la...>> wrote: Dave, My goal is to convert Markdown into a custom JSON schema. For example, if I had a file of: # Prerequisites prereq.md<http://prereq.md/> prereq-2.md<http://prereq-2.md/> # Do stuff here ```shell echo foo echo bar ``` # Do more stuff here ```shell echo baz ``` # Results The only thing that makes it a result is the code type is result. We assume the result is for the last command of the last code block ```result baz ``` An expected output from parsing the input doc would be: { 'prerequisites': ['prereq.md<http://prereq.md/>', 'prereq-2.md<http://prereq-2.md/>'], 'commands': [ { 'command': 'echo foo' }, { 'command': 'echo bar' }, { 'command': 'echo baz', 'expected_result': 'baz' } ] } Essentially, it takes everything inside the code block and creates a list of commands to run. (Some of the commands would have results to compare against. For example, "echo baz" should see a result of "baz"). Prerequisites are other files that it should run before the commands. My goal is to parse a markdown file and create a JSON document that contains the list of commands to execute, prerequisite files, etc. Am I going down the right path with building my own extension? If so, any pointers? On Wed, Dec 20, 2017 at 12:17 PM Dave Pawson <dav...@gm...<mailto:dav...@gm...>> wrote: Which input / which output is not exactly clear - no ID values in Python? General answer though. md has both block and inline constructs so it would seem to meet json needs? Perhaps try it as an extension first? one block, one inline and see how easy it is to work? I found it very logical. YMMV On 20 December 2017 at 17:44, Tommy Falgout <to...@la...<mailto:to...@la...>> wrote: > Hello, > > I'm building a tool to convert markdown into runnable documentation and I > came across your project. I'm currently using Mistune's BlockParser, > because I don't want to convert to HTML, instead into a custom JSON format. > > I looked at your documentation for writing an extension; however, I was > unsure if this would be the right path for me. Can you please advise? If > so, would I want to use a Preprocessor/Postprocessor/InlinePattern/etc.? > > Here's an example of an input document: > https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L25 > > Here's the expected output: > https://github.com/lastcoolnameleft/simdem2/blob/master/tests/test_parser.py#L57 > > Thanks, > -Tommy. > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot > _______________________________________________ > Python-markdown-discuss mailing list > Pyt...@li...<mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss > -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk<http://www.dpawson.co.uk/> ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org<http://Slashdot.org>! http://sdm.link/slashdot_______________________________________________ Python-markdown-discuss mailing list Pyt...@li...<mailto:Pyt...@li...> https://lists.sourceforge.net/lists/listinfo/python-markdown-discuss [https://master.mailbutler.io/tracking/E9C9C5CC-17A1-45F5-AF3A-E3D435FB4691] |