Thread: [Docutils-develop] Docutils' status, (backward) compatibility, document schema classification/compa

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

All,

I used to follow the list although not participate much. I'm glad to see
Du still going tough, with another release this year. Now I'm taking up my
own docutils project again after some quiet and have a sort of brain dump
to get myself started and to share some discussion points and questions.

Coincidence has it that rst2rst was discussed again, which is one of my
pet projects although I don't think I ever announced that here. My
'need-case' is enriching data. This is possible for fixed formats using
ad-hoc scripts easily. But I can't myself find any use case for such a
scenario (not too mention that fixing everything with sed/grep/<insert
script language> gets a bit tiresome if you need to use it too
much--besides that, I think a handwritten document deserves a bit more
elegance).

I'd like to 'rewrite' certain sets of notes at various places. With
enriching I mean rewriting e.g. references or entire elements based on
certain heuristics (specific to that set of notes). Some files may be
generated metadata 'hypercards', which I later extend with handwritten
content while adhering to some basic rSt substructure. In another case I
wish I could write a few keywords and have the publisher look up the
references. The frontends I write myself of course, and with Du's
publisher or Nabu I can do much of what I need.

But in any case, rewriting reStructuredText should be part of docutils, or
I thought that was in its charter? Anyway, I have a few days off and plan
to check out the following which is why I turn to the mailinglist. What is
'so hard' btw is not really serializing the document itself, that is
pretty straightforward and satisfying to do imo. The not straightforward
things are whitespace (disregarding ws. can make rSt notes look
interesting but unwielding) and the fact that not all document elements
and attributes are completely lossless, but that too is easily solved
using an updated Reader as the sandbox rstwriter shows. But the hard part
is what is exactly in the document, and that is actually why I have some
questions.

Additionally about the rst2rst, I'm targetting the easy subsets first for
now. A completely lossless rst-rewriter for the entire rSt markup set
would not be very high on my priorities either. But the beauty is, with
the proper components I don't see why its not possible--which reminds me
of something else. Has compatibility for foreign structures ever been
considered to be 'standardized'? With the rSt parser, encapsulating or
stripping unknown directives should be easy enough. Perhaps not so easy
for all sorts of foreign markup/dialects but it could be a start. Du has a
nice error reporting system and in a sense already encapsulates offensive
markup as literal blocks.

But I was thinking, to proceed I need a matrix of which structures the
various frontends support. I plan to use rSt for hypertext and metadata
purposes--with the enriching scheme described above, I used it for years
for notes, and also at work and OSS projects. However, there are various
implementations around, each of which may be more convenient in a certain
workflow. But what I know of Du and have seen from pandoc and some wiki
scripts is that they can hardly implement all of the rich nested structure
provided by Du. Also, I think nested inline is another Du requirement not
met yet? So it gets slightly more complex still.

To properly describe documents, I need something more than a list of used
elements. What I'm looking for is a way to show differences between
document type descriptions or schema definitions and list those. Based on
the absence or presence of these 'specifications' the document could then
be categorized. Then this gives an endless list of different and
overlapping classes that people or computers could care about. So that
would be list of 'features' of a document, as in the HTTP TCN (transparent
content-negotiation) sense.

Since there is nesting going on I get slightly intimidated though since I
need to interpret and compare relations in the schema too. Do I need to
infer all the nesting from a given schema to show the difference(s)? (So I
guess I could use something like XPath/XSLT depending on the format
complexity). I'd like to at least consider a generic approach, perhaps
eyeing markdown, yaml or other text/outline formats to be able to cover a
wide range of documents like pandoc does.

Would anyone have a suggestion there on what metalanguage to use in my
'comparison matrix'? Or perhaps answer what sort of prior art is there,
wether academic or some practical concrete instances doesn't matter.

Well thanks for reading. I guess I have my initial answer already from
writing this post. Read 'validation' for 'specification', I just
remembered i'll to have a look at some XML tooling next. Its been a long
break. Sorry to test your tl;dr reflexes with multiple questions but I did
not feel like splitting this post up after writing. I would have stayed on
topic if I knew it beforehand, my apologies.

I do hope someone will clarify the Python Docutils status, are we on
maintenance only, is there someone willing to take new directions?

regards, Berend

PS: the absence of a Python 3000 Du and some additional digging leads me
to the conclusion there is no longer aspiration or need to get Du into
stdlib? Ofcourse for source-documentation  to use a completely
customizable hypertext publisher may be considered a bit too ambitious.
Also considering what document publishers usually end up as...

---

 Democracy is two wolves and a sheep deciding what to eat for lunch.
 Liberty is a well-armed sheep contesting the vote.

Thread: [Docutils-develop] Docutils' status, (backward) compatibility, document schema classification/compa

docutils-develop