Re: [Docutils-users] extract code from code directive from restructuredtext using docutils

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 2015-02-03, Mark Andrews wrote:

> [-- Type: text/plain, Encoding:  --]

> (I asked the following question on stackoverflow two days ago. It has not
> been given any answers or even comments and very few views, so I was hoping
> that it would ok to ask it again here.)

> I would like to extract out the source code verbatim from code directives
> in a restructuredtext string.

> What follows is my first attempt at doing this, but I would like to know if
> there is a better (i.e. more robust, or more general, or more direct) way
> of doing it.

> Let's say I have the following rst text as a string in python:

> s = '''

> My title
>========

> Use this to square a number.

> .. code:: python

>    def square(x):
>        return x**2

> and here is some javascript too.

> .. code:: javascript

>     foo = function() {
>         console.log('foo');
>     }

> '''

> To get the two code blocks, I could do

> from docutils.core import publish_doctree

> doctree = publish_doctree(s)
> source_code = [child.astext() for child in doctree.children if 'code'
> in child.attributes['classes']]

> Now *source_code* is a list with just the verbatim source code from the two
> code blocks. I could also use the *attributes* attribute of *child* to find
> out the code types too, if necessary.

> It does the job, but is there a better way?

Looks plain and clean to me. If you are using Docutils anyway, then this is
the way to go. If you want to skip the overhead of parsing the complete
document just to extract code, you could also create a "minimal parser",
that just looks for ".. code::" (or maybe also literal blocks) and copies
the following indented block. (This is what PyLit does.)

Günter