From: Guenter M. <mi...@us...> - 2015-02-15 16:42:36
|
On 2015-02-03, Mark Andrews wrote: > [-- Type: text/plain, Encoding: --] > (I asked the following question on stackoverflow two days ago. It has not > been given any answers or even comments and very few views, so I was hoping > that it would ok to ask it again here.) > I would like to extract out the source code verbatim from code directives > in a restructuredtext string. > What follows is my first attempt at doing this, but I would like to know if > there is a better (i.e. more robust, or more general, or more direct) way > of doing it. > Let's say I have the following rst text as a string in python: > s = ''' > My title >======== > Use this to square a number. > .. code:: python > def square(x): > return x**2 > and here is some javascript too. > .. code:: javascript > foo = function() { > console.log('foo'); > } > ''' > To get the two code blocks, I could do > from docutils.core import publish_doctree > doctree = publish_doctree(s) > source_code = [child.astext() for child in doctree.children if 'code' > in child.attributes['classes']] > Now *source_code* is a list with just the verbatim source code from the two > code blocks. I could also use the *attributes* attribute of *child* to find > out the code types too, if necessary. > It does the job, but is there a better way? Looks plain and clean to me. If you are using Docutils anyway, then this is the way to go. If you want to skip the overhead of parsing the complete document just to extract code, you could also create a "minimal parser", that just looks for ".. code::" (or maybe also literal blocks) and copies the following indented block. (This is what PyLit does.) Günter |