[Docutils-develop] Re: reST includes first cut (long...)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> And yes, you're right about returning a uniform set of data.  But we
> actually need a 4-tuple; read on.

Yes, I originally wrote "4-tuple," then modified it to reflect what
you'd written about the spec.

> The concept of "optional" is useful.  I'm using the XML idea of
> #IMPLIED attributes, rather than defaults in the DTD (or in the
> attribute parsing code, in our case).  I've always found it more
> flexible for the downstream parts of the processing chain to make the
> decisions; keep your options open.  If you put in default values
> early, you lose the information that the attribute just *wasn't
> specified*, and that can be valuable information lost.

I think that this complicates processing by the user (programmer) of a
directive.  When searching for a value, we'll always have to test
whether the value is set or not.  Note, that the #IMPLIED vs. DTD
default isn't what I'm arguing here--that's an implementation detail. 
If you're parsing HTML or another known format and the spec says that
htmlOption.selected returns a boolean, you want to get a boolean whether
the option was set explicitly <option selected="false"/> or not
<option/>.  

If all we want to know is, 'was this set?' we can initialize to false
(0) and pass the exists function ('def exists(arg): return 1).  If we
need to know existence as well as value, then return a tuple (def
existsInt(arg): return (1, int(arg)).  Again, this is explicit vs.
implicit, and it allows you to change things like:

if attributes.has_key('selected') and attributes['selected']:
else:
    do_if_true()
else:
  do_if_false()

to:

if attributes['selected']:
  do_if_true()
else:
  do_if_false()

Which looks cleaner to me.  I guess there's some question of whether
this is a valid usecase.  Personally, I would rather know that I can
access an attribute and get a value back consistently, without having to
test for it.  I guess it depends on where and how often we need to know
whether a value has not been set at all.

> Modelling directives on shell commands works well.  Let's go with it.

Does it?  I find shell command options to be really difficult to
remember, after ten years of steady use.  And shell commands are
inherently verbs, making their options adverbs, if you will.  Documents
are inherently nouns, making their option/attributes adjective, to
stretch a metaphor.

> > Well, I think we *do* need URIs if we want includes to be useful to
> > ZReST (which is not file-system based, but lives in an
> > object-oriented database accessible by URI).  Obviously, we don't
> > need a file: URI for such a trivial (and file-based) example, that's
> > just a way to test that URIs work in general.
> 
> OK, good use case.  But I'm still uncomfortable with the "openAny"
> function you sent::
> 
>     def openAny(path):
>         try:
>           # is it a file?
>           return open(path)
>         except :
>           try:
>             # is it a url?
>             return urlopen(path)
>           except (URLError, ValueError):
>             # treat as a string
>             return StringIO(path)
> 
> Especially the final StringIO part; the function should simply fail.
> Should we check if "path" is a URL first, to avoid the "open(path)"
> failure?  Or is this a case of "look before you leap" vs. "it's easier
> to ask forgiveness than permission"?  Can a URI look like a filesystem
> path?  Seems a bit ambiguous to me.

Sure.  There's no need to have openAny handle strings if we're not
allowing the include directive to have a raw attribute.  That was an
artifact of how I was interpreting include at the time.

> Out of curiosity, what would a Zope/ZReST URL look like?

Zope turns object references into URI paths.  So a folder which lives at
http://myserver.com/myfolder

could have a document which lives at
http://myserver.com/myfolder/mydocument

which could be processed by a method edit, returning the document in an
edit form
http://myserver.com/myfolder/mydocument/edit

or maybe searched by xpath
http://myserver.com/myfolder/mydocument/xpath?/root/branch/leaf[@selected]

In other words, the URI paths are turned into object refernces in the
ZODB (Zope Object Database) and the default action is taken on them
(call, if it's a method, transform into HTML if it's data, etc.)  It's a
rich and complex environment, with a bit too much magic going on behind
the scenes for my taste.

> > The way I was grabbing (or failing to grab?) the content threw an
> > exception if I put the attributes first, but worked OK if I put them
> > after.  I didn't like this either.
> 
> The spec is there for a reason. :-) But it's not immutable.  It can be
> changed when there's good reason.  The code too; there's a *reason*
> we're not at release 1.0 yet!  This is a learning experience.

Yup.  And I'm learning.  I have read the spec, but I haven't
*internalized* it yet the way I have with, say, the DOM.  So I'm
learning mostly from the examples of the code, and trying to keep to the
spec.  But I know I'll still goof at this point (this is very much a
side project for me right now), so I'm very glad you're there for a
sanity check.

> > But it seems more intuitive to me to put the required argument
> > first: so the first argument to :raw: would always be format and we
> > don't have to run tests or special cases.  If there's a second
> > argument, then it's path.
> 
> You mis-read.  It *is* ".. raw:: format [path]" (the *second*
> argument, "path", is optional).

Oops.  My bad.

> > This still isn't as clear and explicit (to me) as using attributes,
> > but better than having argument position be dependent on number of
> > arguments.
> 
> It could be::
> 
>     .. raw:: format
>        :source: path/URL
> 
> Nothing wrong with that, I suppose.  Since it is an *optional*
> argument, it does fit into the "option" mold.  And in fact, it could
> be even more explicit (and remove my misgivings at the same time) if
> we made the option more specific::
> 
>     .. raw:: format
>        :file: path
> 
>     .. raw:: format
>        :url: URL

Hey, I like it.  

> > Yikes.  Now I see why you want get_directive_content().  Doesn't
> > that introduce all sorts of possible ambiguities?  Shouldn't there
> > be one format for directives in all cases so that the users (typing
> > in raw text with reST getting in their way as little as possible)
> > don't have to remember the special cases for different directives?
> 
> Yes and yes, and I came to the same conclusion, as discussed above.
> Just "parse_directive()" will be sufficient.

Cool.

> * ".. include:: filepath" (must be reStructuredText).
> 
> * ".. raw:: format" + either:
> 
>   - a second, optional "filepath" argument, or
>   - a "source" (or equivalent) option, or
>   - (perhaps best?) *two* options, one for a filesystem path source,
>     the other for a URL source (and we can't use both in one
>     directive)

There's actually a usecase for including both.  It's the same as using
both PUBLIC and SYSTEM identifiers for a DTD.  Basically, it says, get
it from this URL if available, or this file if you can't get to the URL
for some reason.  A fallback option, in other words.

>   If the "external source" argument or option is specified (in
>   whatever form), there can be no directive content.  If there is,
>   it's an error.
> 
> * A single function for directives/__init__.py, "parse_directive".
>   (Plus any auxiliary functions required, of course.)
> 
> * An exception for directives/__init__.py::
> 
>      class DirectiveParseError(docutils.ApplicationError): pass
> 
>   (It doesn't need an "__init__" method, but it doesn't hurt much.)
> 
> The signature for "parse_directive" could be something like this::
> 
>     def parse_directive(match, type_name, state, state_machine,
>                         option_presets, arguments=None,
>                         option_spec={}, content=None):
>         """
>         Parameters:
> 
>         - `match`, `type_name`, state`, `state_machine`, and
>           `option_presets`: See `docutils.parsers.rst.directives.__init__`.
>         - `arguments`: A 2-tuple of the number of ``(required,
>           optional)`` whitespace-separated arguments to parse, or
>           ``None`` if no arguments (same as ``(0, 0)``).  If an
>           argument may contain whitespace (multiple words), specify
>           only one argument (either required or optional); the client
>           code must do any context-sensitive parsing.
>         - `option_spec`: A dictionary, mapping known option names to
>           conversion functions such as `int` or `float`.  ``None`` or
>           an empty dict implies no options to parse.
>         - `content`: A boolean; true if content is allowed.  Client
>           code must handle the case where content is required but not
>           supplied (an empty content list will be returned).
> 
>         Returns a 4-tuple: list of arguments, dict of options, list of
>         strings (content block), and a boolean (blank finish).
> 
>         Or raises `DirectiveParseError` with arguments: node (system
>         message), boolean (blank finish).
>         """
> 
> Once "parse_directive" is ready, we'll be able to convert all existing
> directives use it.  As a side-effect, we will be able to drop the
> "data" parameter from the directive function signature.  The end
> result will be much simpler directive code.  Great result!

Looks great.  Thanks for the comments and the help.

--Dethe