Re: [Docutils-develop] reST.php?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Dethe Elza wrote:
> I didn't know about asdom(), I'll have to explore that to see how
> expensive it is and which DOM implementation it uses.

It uses xml.dom.minidom, but it's intended to be able to use any DOM
implementation (it's parameterized).  I haven't actually tried it with
any other DOM, so I don't know if it works or not.  It's a case of
"don't add functionality until it's needed".  So if it's needed, it
can be added.  (By those who need it, of course!)

Please don't think of *any* part of the current Docutils
implementation as written in stone.  It's all experimental, all
subject to change if we discover it's broken or deficient in some way.
That's what the "0." part of the "0.2" ("0.2.3", currently) version
number is meant to imply.

> I still think that the internals could be simplified and that this
> would encourage more participation in the project.

There are hairy parts of the parser code which could use refactoring,
true.  You're working on one of them: the directive parser.  In the
parser's core code (states.py), there's some duplicate code that could
be refactored.  But that's low priority on my list at present.  It
ain't broke (much), so it's not important (to me) to fix it (yet).

> I've seen some complaints about the complexity of reST in toto,
> which I think could be addressed by modularizing reST, but that's
> another issue.

It's just people mistaking richness for complexity, and the ubiquitous
"I don't need it, therefore it's superfluous" crap.  It's mostly
bogus.  We can modularize the parser on a case-by-case basis, but I
think a general modularization will only increase the perceived
complexity.  Some modularization is already in place: the code behind
the "--pep-references" and "--rfc-references" options was ripped out
of the PEP reader, so that it could be used any time.

> I agree that the parser is the most complex component of reST, so if
> it focuses on the parser and reuses architecture from the python
> libraries for the rest of reST it may be easier to grok for a
> programmer coming to it fresh.

Ideally yes, but in practise it's too expensive.  It's just too
convenient for a transform to say ``isinstance(node, Body)``, since
the element hierarchy is built in to nodes.py.  This would be painful
if we were using a DOM tree.

> I thought a version of PyXML was part of the core now, but not
> 4Suite.

There's xml.dom (.minidom, .pulldom), xml.sax, and xml.parsers.expat.
That's it.  Your build or distribution of Python may include it, but
it's not part of the core.

> Besides, Optik is not part of core python, but it drastically
> simplifies the reST code, so it's included in docutils.

Optik *will* be part of core Python, as of 2.3.  And it's small enough
that we can include it with Docutils now.  One of the goals is for
Docutils itself to be included in the core stdlib, so we can only use
modules already part of the stdlib or scheduled to be included.  We
can't presume to get PyXML into the stdlib riding on Docutils'
coattails.

>>> And my *other* project of converting existing HTML and DocBook
>>> documents into reST for maintenance would be that much easier!
>> 
>> I don't follow this at all.  Can you elaborate?
> 
> Sorry, that wasn't very clear.  I want to think of the reST DOM as
> it's canonical form, so I can transform XHTML and DocBook to reST
> via XSLT.  Ideally I also want a writer to create reST from the reST
> DOM.

That last sentence seems to me the key, and that's exactly what's
missing.  Transforming canonical Docutils DOM to internal nodes.py
doctree should be reasonably easy, but without the reStructuredText
writer, it's practically useless.  If I were writing a converter from
XHTML or DocBook to reStructuredText, I would do it in a quick & dirty
way, completely independently of Docutils (as I've described before).

In addition, there's a lot going on behind the scenes that the
Docutils DTD doesn't expose.  Try running ``html.py --dump-internals
input.txt ...`` to see what I mean.  ("--dump-internals" is an
internal, hidden option, for debugging.)

> Thanks, that's a good start.  DocBook has added support for
> describing EBNF in documentation, as well as including modules for
> MathML and SVG, it is essentially a superset of XmlSpec, which is
> the *other* widely used XML documentation format (at least in the
> W3C).  I just had a wild idea that instead of inventing a new XML
> DTD for internal structure, reST could use DocBook (or a subset of
> it) for it's DOM representation. Like I said, a wild idea.

I looked at DocBook, XMLSpec, TEI, and HTML (and I've worked
extensively with DocBook and TEI in the past), but none of them fit.
I've borrowed ideas from some of them.  It was easier to start fresh
than try to shoehorn an existing system into use in Docutils.

>> For example, I know of no DTD that has the equivalent of a
>> "transition" element, although they're quite common in novels and
>> articles.
> 
> <hr /> doesn't qualify?

That's true, that's the one example I do know of.  But HTML doesn't
really count; I think of it as an output format, not as descriptive
markup.  I'm still puzzled why there's no transition equivalent in
TEI, which is a DTD for publishers!  They probably just use an empty
paragraph or some other kludge.  I don't know of a decent/standard DTD
for novels out there.

> Thanks for the eloquent feedback.

Any time.  Thank *you* for the penetrating questions.

-- 
David Goodger  <go...@us...>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/