Re: [Docutils-users] Writing Custom Directives, Expand Using Dynamic Data?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hey, I got an answer! I didn't see it until now--this list does not have 
much traffic...

So, since that previous e-mail, I went off and, with some help, did what 
I described, and we are very close to declaring Phase I a victory. Phase 
II will be putting this to practical work.

Let me give you a report on how things have gone:

  * First looked at sample code in the docutils tree for how to process
    RST data programmaticly, from inside another Python program, and the
    resulting code has worked with few ongoing changes needed.

  * Have written quite a few custom directives. I am usually not a fan
    of being too object oriented, but with inheritance patterns I set up
    it is possible for me to create new custom directive for a new
    dynamically generated table in just a few lines of code. Or for a
    new dynamically generated graphic in just a few lines of code.

  * Have done a couple custom roles for cases where we needed syntax
    that could occur without breaking off into a new paragraph.

  * Pending nodes are our friends! We can drop them in the doctree as we
    go, then traverse them later, giving each a chance to convert itself
    into appropriate static data.

  * The API going into this code is roughly three calls:

     1. Instantiate the thing, passing in the name of a starting RST
        file (in other words, initialize a context for everything else
        that happens);
     2. Make a call asking what dynamic data is needed, passing in an
        initially empty dictionary, get back a list of missing data,
        fetch what is missing, update the dictionary, call
        again...repeat until there is no more missing data;
     3. With the now-complete dictionary of dynamic data make a render
        call, listing the needed output format(s), and get back the
        results, make more render calls on the same context if desired.

    The code on the RST side is kept completely ignorant of where the
    dynamic data actually comes from, and the code doing the data fetch
    is kept completely ignorant of all details of what RST looks like or
    might mean. This was important, not necessarily obvious it was
    needed, and not trivial to do.

  * Not everything can be done as a pending node, some things need to
    happen at parse time.

    This was a problem! We aren't changing RST syntax, we mostly have no
    gripes with what the parser does and had no interest in rewriting or
    otherwise messing with it, but when the parser hits one of our
    custom directives or custom roles and that code immediately needs
    some external data, there is no provision to communicate those needs
    up the call chain. So we go around the call chain: We run the parser
    as its own thread and when our code needs to communicate with the
    outside world it does so over a pair of Python queues, one for
    communicating out what dynamic data is needed, the other for getting
    back in the requested stuff. When it has what it needs, it acts
    accordingly, and the parsing continues per normal until some other
    directive or role needs additional data at parse time. This is
    admittedly a hack, but I think a reasonable one.

The state machine code that implements the three-call API, initializes 
things, fires up the second thread, etc., is not simple, but its 
responsibilities are otherwise very limited, so it has been stable for 
some time, we don't need to mess with it much. That file is less than 
500-lines long, and that's with a lot of comments. Not bad.

Building tables programmatically with ASCII art---um, no, correction: 
utf-8 art!---is a bit ugly. And building tables by creating the right 
doctree nodes directly is not exactly well documented. At the moment we 
are doing some of each and we will need to rationalize that.

We have a couple custom directives that we don't use from our RST files 
at all, rather they are instantiated programmatically by custom roles, 
to set up pending operations.

It all works pretty well. We are at the point that we /can/ get the code 
to do what we want (sometimes after some exploring when we want 
something new), now we mostly have to make sure it /does/ do what we 
want. We are dealing with content issues now, which was the whole point.

The hardest parts were two:

 1. Figuring what custom directives and roles would accomplish what we
    needed while still being sensible RST. There was a risk of inventing
    a whole new, completely obscure, domain-specific, programming
    language here. I needed to avoid that. Our source documents need to
    be something that normal people who know how to write, can write.
 2. Segregating concerns wherever we could, and keeping the
    implementation as architecturally clean as possible. I have left out
    a lot here, there are other system concerns that had to be accounted
    for yet still chased away and not allowed to creep into and
    complexify this into never working.

It is hard to make something simple. I think we have done a pretty 
decent job.

-kb