[Docutils-develop] Re: Handling interpreted text

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

[Jason Diamond]
>>> But you're letting people create new node types just by making up new
>>> role names?

[David Goodger]
>> No, not arbitrarily.  Authors would have to choose from a
>> pre-determined set of roles, each having pre-existing software
>> support.  For instance, your acronym example would have to have
>> support in the parser, to create the "acronym" elements and associate
>> "title" attributes from a lookup table.
>> 
>> Interpreted text with unknown roles would generate errors.

[Aahz]
> -1 unless it's easier than the current system for adding new directives.

Ease of implementation isn't even on the radar at this point.  Correctness
is dead center.  For most interpreted text processing, it is *wrong* to
delay processing (interpretation) until the Writer.  It is *absolutely
correct* to handle the processing (at least the initial stage) in the
Parser.

I apologize that I can't explain interpreted text any better than I already
have, in the markup spec (spec/rst/reStructuredText.txt), PEP 258,
spec/pysource.txt, previous posts, and this one (I *was* planning to go to
bed early tonight... so please read carefully).  While I'm sure my mental
model will have to adapt as the implementation unfolds (it's a process of
discovery), I'm pretty confident of its foundation.

The current state of the implementation is a red herring.  It's going to
change, because it's a half-assed, incomplete, and totally *wrong*
implementation.

But feel free to try to convince me otherwise.  :)

> Currently to handle a directive you need support in both parser and
> writer, but for interpreted text you only need support in the writer.

You only need Writer support for a directive if the directive introduces a
new element into the DTD.  Some do: the directives whose only reason for
being is to allow the creation of these elements (like "image" and "meta").
Most interesting ones don't though (like "contents" or "include"); they just
insert standard elements into the doctree.

In most cases there's nothing Writer-specific about interpreted text.
Interpreted text is entirely a reStructuredText markup construct, a way to
get around built-in limitations of the medium.  No other form of markup
would require anything resembling an "<interpreted>" element.

> While I understand what you want to do (and it would make certain parts
> of what I want to do with interpreted text easier), I'm concerned about
> adding complexity.

If an author wants an "acronym" element, they should get one.  A real
acronym element, not an <interpreted role="acronym"> surrogate.  If we allow
the surrogate and take that line of reasoning to the extreme, all we'd need
for a DTD is this bogosity ::

    <!ELEMENT element (element|PCDATA)*>
    <!ATTLIST element
        name     NMTOKEN   #REQUIRED
        attlist  CDATA     #IMPLIED>

(Which BTW is essentially what DOM is, but it has good reason.)

My reasoning is that all supported interpreted text roles must be known by
the Parser.  There's no guarantee that an arbitrary role will be supported
by the eventual Writer.  Adding a new role is tantamount to adding a new
element to the DTD, may require extensive support, and shouldn't be taken
lightly.  However, there should be a limited number of such roles.  Allowing
different Writers to support different elements would fragment the system
beyond repair.  It must remain a unified whole.  The only place where
variation is acceptable is at the start, at the Reader/Parser interface
(with possible transforms inserted *by* the Reader).  Once past the
Transformer, no variation from standard Docutils doctree is possible.

On the other hand, most new elements won't require Writer support, because
they will be Reader-specific extensions to the DTD, and will be converted to
standard elements by Reader-specific transforms.  Case in point: the Python
Source Reader, which will use interpreted text extensively.  The default
role will be "Python identifier", which will be further interpreted by
namespace context into <class>, <method>, <module>, <attribute>, etc.
elements (see spec/pysource.dtd), which will be transformed into standard
hyperlink references, which will be processed by the various Writers.  The
point is that no Writer will need to have any knowledge of the Python-Reader
origin of these elements.

(I'm sorry if this gives you a sinking feeling, like "all my work is going
to waste".  Rest assured, it won't.  If you check in your code into your
sandbox, all the good ideas will eventually migrate into the main tree.  The
eventual form these ideas take may be very different from the current form
though.)

> Because I know you're going to ask ;-), the current use I'm making of
> interpreted text is chapter/figure/list references::
> 
>     To understand better how mutable and immutable objects work, see
>     code listings :list:`mutable.py` and :list:`immutable.py`, as well
>     as figures :figure:`mutable.eps` and :figure:`immutable.eps`.
>     You'll also want to read :chapter:`Data`.

Thanks for answering without me having to ask. :)  The obvious follow-up
questions are: how should all of that be processed and rendered?  What is
the purpose and semantics?  Why not just use standard hyperlink references?
(I could guess, but I'd rather not.)

I hope this all makes sense.  Good night!

-- 
David Goodger  <go...@py...>  Open-source projects:
  - Python Docutils: http://docutils.sourceforge.net/
    (includes reStructuredText: http://docutils.sf.net/rst.html)
  - The Go Tools Project: http://gotools.sourceforge.net/