From: <li...@do...> - 2013-11-19 11:40:53
|
All, I used to follow the list although not participate much. I'm glad to see Du still going tough, with another release this year. Now I'm taking up my own docutils project again after some quiet and have a sort of brain dump to get myself started and to share some discussion points and questions. Coincidence has it that rst2rst was discussed again, which is one of my pet projects although I don't think I ever announced that here. My 'need-case' is enriching data. This is possible for fixed formats using ad-hoc scripts easily. But I can't myself find any use case for such a scenario (not too mention that fixing everything with sed/grep/<insert script language> gets a bit tiresome if you need to use it too much--besides that, I think a handwritten document deserves a bit more elegance). I'd like to 'rewrite' certain sets of notes at various places. With enriching I mean rewriting e.g. references or entire elements based on certain heuristics (specific to that set of notes). Some files may be generated metadata 'hypercards', which I later extend with handwritten content while adhering to some basic rSt substructure. In another case I wish I could write a few keywords and have the publisher look up the references. The frontends I write myself of course, and with Du's publisher or Nabu I can do much of what I need. But in any case, rewriting reStructuredText should be part of docutils, or I thought that was in its charter? Anyway, I have a few days off and plan to check out the following which is why I turn to the mailinglist. What is 'so hard' btw is not really serializing the document itself, that is pretty straightforward and satisfying to do imo. The not straightforward things are whitespace (disregarding ws. can make rSt notes look interesting but unwielding) and the fact that not all document elements and attributes are completely lossless, but that too is easily solved using an updated Reader as the sandbox rstwriter shows. But the hard part is what is exactly in the document, and that is actually why I have some questions. Additionally about the rst2rst, I'm targetting the easy subsets first for now. A completely lossless rst-rewriter for the entire rSt markup set would not be very high on my priorities either. But the beauty is, with the proper components I don't see why its not possible--which reminds me of something else. Has compatibility for foreign structures ever been considered to be 'standardized'? With the rSt parser, encapsulating or stripping unknown directives should be easy enough. Perhaps not so easy for all sorts of foreign markup/dialects but it could be a start. Du has a nice error reporting system and in a sense already encapsulates offensive markup as literal blocks. But I was thinking, to proceed I need a matrix of which structures the various frontends support. I plan to use rSt for hypertext and metadata purposes--with the enriching scheme described above, I used it for years for notes, and also at work and OSS projects. However, there are various implementations around, each of which may be more convenient in a certain workflow. But what I know of Du and have seen from pandoc and some wiki scripts is that they can hardly implement all of the rich nested structure provided by Du. Also, I think nested inline is another Du requirement not met yet? So it gets slightly more complex still. To properly describe documents, I need something more than a list of used elements. What I'm looking for is a way to show differences between document type descriptions or schema definitions and list those. Based on the absence or presence of these 'specifications' the document could then be categorized. Then this gives an endless list of different and overlapping classes that people or computers could care about. So that would be list of 'features' of a document, as in the HTTP TCN (transparent content-negotiation) sense. Since there is nesting going on I get slightly intimidated though since I need to interpret and compare relations in the schema too. Do I need to infer all the nesting from a given schema to show the difference(s)? (So I guess I could use something like XPath/XSLT depending on the format complexity). I'd like to at least consider a generic approach, perhaps eyeing markdown, yaml or other text/outline formats to be able to cover a wide range of documents like pandoc does. Would anyone have a suggestion there on what metalanguage to use in my 'comparison matrix'? Or perhaps answer what sort of prior art is there, wether academic or some practical concrete instances doesn't matter. Well thanks for reading. I guess I have my initial answer already from writing this post. Read 'validation' for 'specification', I just remembered i'll to have a look at some XML tooling next. Its been a long break. Sorry to test your tl;dr reflexes with multiple questions but I did not feel like splitting this post up after writing. I would have stayed on topic if I knew it beforehand, my apologies. I do hope someone will clarify the Python Docutils status, are we on maintenance only, is there someone willing to take new directions? regards, Berend PS: the absence of a Python 3000 Du and some additional digging leads me to the conclusion there is no longer aspiration or need to get Du into stdlib? Ofcourse for source-documentation to use a completely customizable hypertext publisher may be considered a bit too ambitious. Also considering what document publishers usually end up as... --- Democracy is two wolves and a sheep deciding what to eat for lunch. Liberty is a well-armed sheep contesting the vote. |
From: Berend <be...@do...> - 2013-11-19 20:17:32
|
All, some more answers to myself.. On Tue, November 19, 2013 12:25 pm, li...@do... wrote: > I'd like to at least consider a generic approach, perhaps > eyeing markdown, yaml or other text/outline formats to be able to cover a > wide range of documents like pandoc does. Adding xmllint --valid was easy enough. A few of my testcases turned out to be invalid. I guess I can try extracting the relevant terms from the DTD to 'generate' a schema for a given rSt file. > I do hope someone will clarify the Python Docutils status, are we on > maintenance only, is there someone willing to take new directions? [pep-0258] says its independent (2001-2009) . ps. Sorry for the obscure 'from' field, in my last message. I hope this fixes that. |
From: Guenter M. <mi...@us...> - 2013-11-20 22:45:20
|
Dear Berend, On 2013-11-19, Berend wrote: > On Tue, November 19, 2013 12:25 pm, li...@do... wrote: ... >> I do hope someone will clarify the Python Docutils status, are we on >> maintenance only, is there someone willing to take new directions? While maintenance and stability has precedence, there is no rule forbidding additions or developments. However, there is a long list of TODO items (like nested inline markup or equation and table numbering and bibtex-like citation support) and bugs, lack of manpower and no priority list. > [pep-0258] says its independent (2001-2009) . As far as I understand, the idea to make Docutils part of the standard library was abadoned long ago. However, the Sphinx document processor which builds on and extends Docutils is the officilly used tool for the Python documentation and reStructredText (with some Sphinx extensions) is its source language. BTW: Docutils can be used with Py3k after installing it with distutils (`python3 setup.py`). However, the development takes place on the 2.x version of the code. Günter |