[Docutils-develop] Re: reST includes first cut (long...)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi David,

Thanks for the great feedback.  I hope you don't mind me taking this
back online, I think there's some valuable stuff here.

On Sat, 2002-08-24 at 08:46, David Goodger wrote:
> Dethe Elza wrote:
> > Included are two python files and three test documents.
> 
> Thank you!  It's great to get contributions!
> 
> > I'm getting going on the HOWTO, but having a problem with one of the
> > test cases that I haven't been able to resolve, but probably you can
> > help.
> 
> I'll do my best.  I just hope I don't scare you away.

Not a bit of it.

> > Here's the problem.  If you process test1.rst using the standard
> > html.py it includes test2.rst and processes it, which includes
> > test3.rst.  The content of test3.rst shows up OK, but it isn't being
> > processed as reST.
> 
> I haven't actually run the code yet.  I'll do that next.  In the
> meantime, here are some comments on what you've sent:
> 
> > There's a util.py file which gives some helpers for creating
> > directives
> 
> I'm looking at the "utils.cheapDirective" function you wrote.  I think
> it's a very good idea to provide a generic directive parser function,
> but to be truly useful, this function needs to be even *more* generic.
> Please bear with me here.

More generic would be good, I was working out from the images directive
and trying to factor what was specfic to that directive from what was
common to all directives as I went.

> From the markup spec, there are 3 physical parts to a directive:
> 
> 1. Directive type (identifier before the "::")
> 2. Directive data (remainder of the first line)
> 3. Directive block (indented text following the first line)

Except that there is a fourth part: The attribute list.

> These correspond to the syntax diagram::
> 
>     +-------+--------------------------+
>     | ".. " | directive type "::" data |
>     +-------+ directive block          |
>             |                          |
>             +--------------------------+
> 
> Looking at the definition of a directive, and descriptions of the
> existing directives, and your code, I've realized that there are rules
> we can apply here.  The physical directive data and block comprise up
> to 3 *logical* parts:
> 
> 1. Directive arguments.
> 2. Directive attributes (or "options", in command-line terms).
> 3. Directive content.

This is beginning to look good.

> Individual directives can use any combination of these:
> 
> - The admonitions directives ("note" etc.), "topic", "line-block", and
>   "parsed-literal" use content only, no arguments or attributes. (3)
> - "meta" also has only content; the content looks like an attribute
>   block, but it's parsed as a field list. (3)
> - "image" and "contents" each have one argument, and attributes. (1,2)
> - "figure" has one argument, attributes, and content. (1,2,3)
> - "sectnum" has attributes only. (2)
> - "target-notes" has nothing. ()
> 
> If a directive has arguments and/or attributes, they are taken from
> the directive data and directive block, up to the first blank line.
> (Note that the docs for the "raw" directive didn't have a blank line;
> that was a mistake, and I've fixed it.)  If the directive has content
> also, it is taken after the arguments and/or attributes, from the
> directive block.  If the directive has content only, it is taken from
> the directive data and directive block, up to the end of the
> indentation.
> 
> I think the "cheapDirective" function should be reworked to allow any
> variation of directive structure to be parsed, renamed to
> "parse_directive", and put into directives/__init__.py.  Let me know
> if you'd like to give this a try, or if you'd rather I did it.  But
> read on; there's a wrinkle ahead.

I think I can do that.

> > In order to give some context to the debate over :raw: vs.
> > :include:, I implemented both.
> 
> I think the "include" directive should be reStructuredText-only, and
> the "raw" directive should have an optional second argument, a path.
> They definitely should not duplicate each other's functionality
> (TOOWTDI).  In the terms defined above,

I'm not disagreeing, but I got a bit lost in the debate over :raw: vs.
:include: and wanted a) some context to help me form an opinion, and b)
to test the generalized directive parsing by using it for more than one
directive.

> - "include" should have one argument, a path. (1)
> 
> - "raw" should have one or two arguments -- a format ("html" etc.) and
>   an optional path -- and content, but only if there was no path
>   argument.  So this would be either a (1) or a (1,3) directive.  So
>   that forces us to split the parse in two, perhaps into
>   "parse_directive" (which parses the arguments & attributes), plus
>   "get_directive_content".

Not necessarily.  Is there any reason we can't parse a directive and
return a 3-tuple (arguments, options, content).  It would then be up to
the individual directive to test that these do have values, if required.

> Note that there are no attributes/options now.  If an attribute is
> required, it shouldn't be an attribute.  It's analogous to
> command-line options: "required option" is an oxymoron.  I'm thinking
> of changing the terminology in the spec from "attribute" to "option"
> to help reinforce this.

OK, but this means sub-parsing the arguments instead of pulling data out
of the attributes.  My preference would be to have a standard way to
specify not only the types of attributes, but default values for them. 
Then there are no optional attributes, just default or explicitly set. 
This is a more XML-ish way to go.

> Looking at the directives in the test1.rst file in order, first
> there's::
> 
>     .. include:: file:test2.rst
> 
> I'm not comfortable with URL syntax here.  I really think it's YAGNI,
> and may open up a big can of worms.  So that one should become::

Well, I think we *do* need URIs if we want includes to be useful to
ZReST (which is not file-system based, but lives in an object-oriented
database accessible by URI).  Obviously, we don't need a file: URI for
such a trivial (and file-based) example, that's just a way to test that
URIs work in general.

>     .. include:: test2.rst
> 
> The next one is an "include"/"raw" hybrid::
> 
>     .. include:: test3.rst
>        :raw: 
>        :format: html

Well, there were a few possibilities tossed around in email.  I couldn't
remember why the format was even needed (it's *raw*, what else do we
need?), but then my data wouldn't show up using html.py and I had to dig
through the writer code to figure out that raw nodes are tested for
format type.

> It should be::
> 
>     .. raw:: html test3.rst

OK, if that's the correct way to do it in reST, I'll do that.  I still
prefer the default attributes method discussed above rather than
multiple arguments because it makes the names and types of the
argument/option/attribute explicit.

> I noticed that you've got the attributes *after* the content in the
> next ones::
> 
>     .. include::
>        This is a <super>Test</super> of the <sub>Emergency</sub>
>        <strike>Broadcasting</strike> &quot;System&quot;
>        :raw:
>        :format: html

Yes, that was my mis-parsing of the directive (because I was working
from images.py, not the spec).  The way I was grabbing (or failing to
grab?) the content threw an exception if I put the attributes first, but
worked OK if I put them after.  I didn't like this either.

> Attributes always come *before* the content (who's to say that
> ":format: html" isn't valid raw data in some format?).  In any case,
> that directive should become::
> 
>     .. raw:: html

OK.  But the same arguments apply.  Who's to say :format: html isn't
valid raw data at the beginning of the data as well as easily as at the
end?  I agree that it's aethetically and functionally better to have the
attributes first, just not because the string ":format: html" might
appear in the data.

Also, the problem with using multiple comes up. Arguments to :raw: are 

path format
or
format

But it seems more intuitive to me to put the required argument first: so
the first argument to :raw: would always be format and we don't have to
run tests or special cases.  If there's a second argument, then it's
path.  This still isn't as clear and explicit (to me) as using
attributes, but better than having argument position be dependent on
number of arguments.

>        This is a <super>Test</super> of the <sub>Emergency</sub>
>        <strike>Broadcasting</strike> &quot;System&quot;
> 
> Next::
> 
>     .. raw::
>        This is <strong>RAW</strong>.  Really, <em>really,</em> raw.
>        :format: html
> 
> Should become::
> 
>     .. raw:: html
> 
>        This is <strong>RAW</strong>.  Really, <em>really,</em> raw.
> 
> (Note the blank line.)

Oh, there's a blank line *before* the content.  I didn't get that.  I
thought the directive *ends* with a blank line.  Yikes.  Now I see why
you want get_directive_content().  Doesn't that introduce all sorts of
possible ambiguities?  Shouldn't there be one format for directives in
all cases so that the users (typing in raw text with reST getting in
their way as little as possible) don't have to remember the special
cases for different directives?

> The last "raw" directive was::
> 
>     .. raw::
>        :include: test3.rst
>        :format: html
> 
> And should become::
> 
>     .. raw:: html test3.rst
> 
> (This is the same as the second directive, therefore redundant.)

The redundancy is deliberate.  I was testing two different possibilities
and wanted to be sure they both worked.  They do the same thing,
therefore they work.

> I've attached an edited test1.rst file.
> 
> > Let me know if anything should be done differently.  I've tried to
> > conform to the project policies (although reading through now I realize
> > that I used triple-single-quotes for docstrings rather than
> > triple-double-quotes, but the files are already zipped.  I'm cool with
> > criticism and still not terribly familiar with the internals, so any
> > advice on how to do it better would be appreciated.
> 
> I hope I haven't scared you off.  If you do decide to continue with
> this, it will give you a chance to fix the triple-single-quotes. ;-)
> Also, I noticed that you used non-standard 2-space indents in some
> places; naughty!

Ah yes, old habits die hard.

> BTW, I never knew that you could do sequence-unpacking on the
> exception arguments.  Thanks for teaching me something new!

My pleasure.  I learned it while reading the python docs on exception
handling last week while working on this, so in a way, you taught it to
me!

So what you'd like to see is:

:raw: [filepath] format

:include: filepath (which must be reST)

and functions for directives/__init__.py

parse_directive()

and 

get_directive_content()

right?

--Dethe