From: Kent B. <ken...@bo...> - 2019-09-01 23:24:54
|
Hey, I got an answer! I didn't see it until now--this list does not have much traffic... So, since that previous e-mail, I went off and, with some help, did what I described, and we are very close to declaring Phase I a victory. Phase II will be putting this to practical work. Let me give you a report on how things have gone: * First looked at sample code in the docutils tree for how to process RST data programmaticly, from inside another Python program, and the resulting code has worked with few ongoing changes needed. * Have written quite a few custom directives. I am usually not a fan of being too object oriented, but with inheritance patterns I set up it is possible for me to create new custom directive for a new dynamically generated table in just a few lines of code. Or for a new dynamically generated graphic in just a few lines of code. * Have done a couple custom roles for cases where we needed syntax that could occur without breaking off into a new paragraph. * Pending nodes are our friends! We can drop them in the doctree as we go, then traverse them later, giving each a chance to convert itself into appropriate static data. * The API going into this code is roughly three calls: 1. Instantiate the thing, passing in the name of a starting RST file (in other words, initialize a context for everything else that happens); 2. Make a call asking what dynamic data is needed, passing in an initially empty dictionary, get back a list of missing data, fetch what is missing, update the dictionary, call again...repeat until there is no more missing data; 3. With the now-complete dictionary of dynamic data make a render call, listing the needed output format(s), and get back the results, make more render calls on the same context if desired. The code on the RST side is kept completely ignorant of where the dynamic data actually comes from, and the code doing the data fetch is kept completely ignorant of all details of what RST looks like or might mean. This was important, not necessarily obvious it was needed, and not trivial to do. * Not everything can be done as a pending node, some things need to happen at parse time. This was a problem! We aren't changing RST syntax, we mostly have no gripes with what the parser does and had no interest in rewriting or otherwise messing with it, but when the parser hits one of our custom directives or custom roles and that code immediately needs some external data, there is no provision to communicate those needs up the call chain. So we go around the call chain: We run the parser as its own thread and when our code needs to communicate with the outside world it does so over a pair of Python queues, one for communicating out what dynamic data is needed, the other for getting back in the requested stuff. When it has what it needs, it acts accordingly, and the parsing continues per normal until some other directive or role needs additional data at parse time. This is admittedly a hack, but I think a reasonable one. The state machine code that implements the three-call API, initializes things, fires up the second thread, etc., is not simple, but its responsibilities are otherwise very limited, so it has been stable for some time, we don't need to mess with it much. That file is less than 500-lines long, and that's with a lot of comments. Not bad. Building tables programmatically with ASCII art---um, no, correction: utf-8 art!---is a bit ugly. And building tables by creating the right doctree nodes directly is not exactly well documented. At the moment we are doing some of each and we will need to rationalize that. We have a couple custom directives that we don't use from our RST files at all, rather they are instantiated programmatically by custom roles, to set up pending operations. It all works pretty well. We are at the point that we /can/ get the code to do what we want (sometimes after some exploring when we want something new), now we mostly have to make sure it /does/ do what we want. We are dealing with content issues now, which was the whole point. The hardest parts were two: 1. Figuring what custom directives and roles would accomplish what we needed while still being sensible RST. There was a risk of inventing a whole new, completely obscure, domain-specific, programming language here. I needed to avoid that. Our source documents need to be something that normal people who know how to write, can write. 2. Segregating concerns wherever we could, and keeping the implementation as architecturally clean as possible. I have left out a lot here, there are other system concerns that had to be accounted for yet still chased away and not allowed to creep into and complexify this into never working. It is hard to make something simple. I think we have done a pretty decent job. -kb |