|
From: Dethe E. <de...@ma...> - 2002-08-26 17:17:58
|
Hi David,
Thanks for the great feedback. I hope you don't mind me taking this
back online, I think there's some valuable stuff here.
On Sat, 2002-08-24 at 08:46, David Goodger wrote:
> Dethe Elza wrote:
> > Included are two python files and three test documents.
>
> Thank you! It's great to get contributions!
>
> > I'm getting going on the HOWTO, but having a problem with one of the
> > test cases that I haven't been able to resolve, but probably you can
> > help.
>
> I'll do my best. I just hope I don't scare you away.
Not a bit of it.
> > Here's the problem. If you process test1.rst using the standard
> > html.py it includes test2.rst and processes it, which includes
> > test3.rst. The content of test3.rst shows up OK, but it isn't being
> > processed as reST.
>
> I haven't actually run the code yet. I'll do that next. In the
> meantime, here are some comments on what you've sent:
>
> > There's a util.py file which gives some helpers for creating
> > directives
>
> I'm looking at the "utils.cheapDirective" function you wrote. I think
> it's a very good idea to provide a generic directive parser function,
> but to be truly useful, this function needs to be even *more* generic.
> Please bear with me here.
More generic would be good, I was working out from the images directive
and trying to factor what was specfic to that directive from what was
common to all directives as I went.
> From the markup spec, there are 3 physical parts to a directive:
>
> 1. Directive type (identifier before the "::")
> 2. Directive data (remainder of the first line)
> 3. Directive block (indented text following the first line)
Except that there is a fourth part: The attribute list.
> These correspond to the syntax diagram::
>
> +-------+--------------------------+
> | ".. " | directive type "::" data |
> +-------+ directive block |
> | |
> +--------------------------+
>
> Looking at the definition of a directive, and descriptions of the
> existing directives, and your code, I've realized that there are rules
> we can apply here. The physical directive data and block comprise up
> to 3 *logical* parts:
>
> 1. Directive arguments.
> 2. Directive attributes (or "options", in command-line terms).
> 3. Directive content.
This is beginning to look good.
> Individual directives can use any combination of these:
>
> - The admonitions directives ("note" etc.), "topic", "line-block", and
> "parsed-literal" use content only, no arguments or attributes. (3)
> - "meta" also has only content; the content looks like an attribute
> block, but it's parsed as a field list. (3)
> - "image" and "contents" each have one argument, and attributes. (1,2)
> - "figure" has one argument, attributes, and content. (1,2,3)
> - "sectnum" has attributes only. (2)
> - "target-notes" has nothing. ()
>
> If a directive has arguments and/or attributes, they are taken from
> the directive data and directive block, up to the first blank line.
> (Note that the docs for the "raw" directive didn't have a blank line;
> that was a mistake, and I've fixed it.) If the directive has content
> also, it is taken after the arguments and/or attributes, from the
> directive block. If the directive has content only, it is taken from
> the directive data and directive block, up to the end of the
> indentation.
>
> I think the "cheapDirective" function should be reworked to allow any
> variation of directive structure to be parsed, renamed to
> "parse_directive", and put into directives/__init__.py. Let me know
> if you'd like to give this a try, or if you'd rather I did it. But
> read on; there's a wrinkle ahead.
I think I can do that.
> > In order to give some context to the debate over :raw: vs.
> > :include:, I implemented both.
>
> I think the "include" directive should be reStructuredText-only, and
> the "raw" directive should have an optional second argument, a path.
> They definitely should not duplicate each other's functionality
> (TOOWTDI). In the terms defined above,
I'm not disagreeing, but I got a bit lost in the debate over :raw: vs.
:include: and wanted a) some context to help me form an opinion, and b)
to test the generalized directive parsing by using it for more than one
directive.
> - "include" should have one argument, a path. (1)
>
> - "raw" should have one or two arguments -- a format ("html" etc.) and
> an optional path -- and content, but only if there was no path
> argument. So this would be either a (1) or a (1,3) directive. So
> that forces us to split the parse in two, perhaps into
> "parse_directive" (which parses the arguments & attributes), plus
> "get_directive_content".
Not necessarily. Is there any reason we can't parse a directive and
return a 3-tuple (arguments, options, content). It would then be up to
the individual directive to test that these do have values, if required.
> Note that there are no attributes/options now. If an attribute is
> required, it shouldn't be an attribute. It's analogous to
> command-line options: "required option" is an oxymoron. I'm thinking
> of changing the terminology in the spec from "attribute" to "option"
> to help reinforce this.
OK, but this means sub-parsing the arguments instead of pulling data out
of the attributes. My preference would be to have a standard way to
specify not only the types of attributes, but default values for them.
Then there are no optional attributes, just default or explicitly set.
This is a more XML-ish way to go.
> Looking at the directives in the test1.rst file in order, first
> there's::
>
> .. include:: file:test2.rst
>
> I'm not comfortable with URL syntax here. I really think it's YAGNI,
> and may open up a big can of worms. So that one should become::
Well, I think we *do* need URIs if we want includes to be useful to
ZReST (which is not file-system based, but lives in an object-oriented
database accessible by URI). Obviously, we don't need a file: URI for
such a trivial (and file-based) example, that's just a way to test that
URIs work in general.
> .. include:: test2.rst
>
> The next one is an "include"/"raw" hybrid::
>
> .. include:: test3.rst
> :raw:
> :format: html
Well, there were a few possibilities tossed around in email. I couldn't
remember why the format was even needed (it's *raw*, what else do we
need?), but then my data wouldn't show up using html.py and I had to dig
through the writer code to figure out that raw nodes are tested for
format type.
> It should be::
>
> .. raw:: html test3.rst
OK, if that's the correct way to do it in reST, I'll do that. I still
prefer the default attributes method discussed above rather than
multiple arguments because it makes the names and types of the
argument/option/attribute explicit.
> I noticed that you've got the attributes *after* the content in the
> next ones::
>
> .. include::
> This is a <super>Test</super> of the <sub>Emergency</sub>
> <strike>Broadcasting</strike> "System"
> :raw:
> :format: html
Yes, that was my mis-parsing of the directive (because I was working
from images.py, not the spec). The way I was grabbing (or failing to
grab?) the content threw an exception if I put the attributes first, but
worked OK if I put them after. I didn't like this either.
> Attributes always come *before* the content (who's to say that
> ":format: html" isn't valid raw data in some format?). In any case,
> that directive should become::
>
> .. raw:: html
OK. But the same arguments apply. Who's to say :format: html isn't
valid raw data at the beginning of the data as well as easily as at the
end? I agree that it's aethetically and functionally better to have the
attributes first, just not because the string ":format: html" might
appear in the data.
Also, the problem with using multiple comes up. Arguments to :raw: are
path format
or
format
But it seems more intuitive to me to put the required argument first: so
the first argument to :raw: would always be format and we don't have to
run tests or special cases. If there's a second argument, then it's
path. This still isn't as clear and explicit (to me) as using
attributes, but better than having argument position be dependent on
number of arguments.
> This is a <super>Test</super> of the <sub>Emergency</sub>
> <strike>Broadcasting</strike> "System"
>
> Next::
>
> .. raw::
> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
> :format: html
>
> Should become::
>
> .. raw:: html
>
> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
>
> (Note the blank line.)
Oh, there's a blank line *before* the content. I didn't get that. I
thought the directive *ends* with a blank line. Yikes. Now I see why
you want get_directive_content(). Doesn't that introduce all sorts of
possible ambiguities? Shouldn't there be one format for directives in
all cases so that the users (typing in raw text with reST getting in
their way as little as possible) don't have to remember the special
cases for different directives?
> The last "raw" directive was::
>
> .. raw::
> :include: test3.rst
> :format: html
>
> And should become::
>
> .. raw:: html test3.rst
>
> (This is the same as the second directive, therefore redundant.)
The redundancy is deliberate. I was testing two different possibilities
and wanted to be sure they both worked. They do the same thing,
therefore they work.
> I've attached an edited test1.rst file.
>
> > Let me know if anything should be done differently. I've tried to
> > conform to the project policies (although reading through now I realize
> > that I used triple-single-quotes for docstrings rather than
> > triple-double-quotes, but the files are already zipped. I'm cool with
> > criticism and still not terribly familiar with the internals, so any
> > advice on how to do it better would be appreciated.
>
> I hope I haven't scared you off. If you do decide to continue with
> this, it will give you a chance to fix the triple-single-quotes. ;-)
> Also, I noticed that you used non-standard 2-space indents in some
> places; naughty!
Ah yes, old habits die hard.
> BTW, I never knew that you could do sequence-unpacking on the
> exception arguments. Thanks for teaching me something new!
My pleasure. I learned it while reading the python docs on exception
handling last week while working on this, so in a way, you taught it to
me!
So what you'd like to see is:
:raw: [filepath] format
:include: filepath (which must be reST)
and functions for directives/__init__.py
parse_directive()
and
get_directive_content()
right?
--Dethe
|
|
From: David G. <go...@us...> - 2002-08-27 00:49:12
|
I'd already begun a follow-up to my last message, which I'll
interweave into this reply.
[Dethe]
> Thanks for the great feedback. I hope you don't mind me taking this
> back online, I think there's some valuable stuff here.
I don't mind at all. Actually, I prefer the discussions online; it
opens up the possibility for more input.
[David]
>> I just hope I don't scare you away.
>
> Not a bit of it.
Glad to hear it. In the message I'd already begun, I wrote:
If this small task has grown too big or or time-consuming or
onerous for you to handle, please let me know and I'll take over,
with gratitude for work done and ideas inspired.
So you've got an escape clause if you change your mind. ;-)
>> I'm looking at the "utils.cheapDirective" function you wrote. I
>> think it's a very good idea to provide a generic directive parser
>> function, but to be truly useful, this function needs to be even
>> *more* generic. Please bear with me here.
>
> More generic would be good, I was working out from the images
> directive and trying to factor what was specfic to that directive
> from what was common to all directives as I went.
And a very fruitful side-trip it has become! It's a step I hadn't
taken. Every new directive began with a copy & paste from the most
similar existing one, but I hadn't looked at the patterns yet.
>> From the markup spec, there are 3 physical parts to a directive:
>>
>> 1. Directive type (identifier before the "::")
>> 2. Directive data (remainder of the first line)
>> 3. Directive block (indented text following the first line)
>
> Except that there is a fourth part: The attribute list.
That's a *logical* part, not physical (at least, not by the original
definition). No matter; I think we'll dispense with the old
"physical" terms as obsolete.
>> I think the "include" directive should be reStructuredText-only,
>> and the "raw" directive should have an optional second argument, a
>> path. They definitely should not duplicate each other's
>> functionality (TOOWTDI). In the terms defined above,
>
> I'm not disagreeing, but I got a bit lost in the debate over :raw:
> vs. :include: and wanted a) some context to help me form an
> opinion, and b) to test the generalized directive parsing by using
> it for more than one directive.
Sure. That's cool.
>> - "include" should have one argument, a path. (1)
>>
>> - "raw" should have one or two arguments -- a format ("html" etc.)
>> and an optional path -- and content, but only if there was no
>> path argument. So this would be either a (1) or a (1,3)
>> directive. So that forces us to split the parse in two, perhaps
>> into "parse_directive" (which parses the arguments & attributes),
>> plus "get_directive_content".
>
> Not necessarily. Is there any reason we can't parse a directive and
> return a 3-tuple (arguments, options, content). It would then be up
> to the individual directive to test that these do have values, if
> required.
Originally, I was thinking that if a directive didn't *need* a content
block, it shouldn't consume it any that happened to be there. That
would allow a directive to be followed by a block quote, but it makes
directives conceptually more difficult than they need to be. While
updating the markup and directive specs as per my last message, I
thought some more about the logical parts of a directive, and I now
believe that the original thinking was wrong. Instead, all indented
text following a directive *should* be consumed by the directive. The
"parse_directive" function should signal an error if there *is*
content, but the directive doesn't ask for it. If a block quote
should follow a directive, an empty comment inserted between them will
do the trick. Docs updated accordingly.
And yes, you're right about returning a uniform set of data. But we
actually need a 4-tuple; read on.
>> Note that there are no attributes/options now. If an attribute is
>> required, it shouldn't be an attribute. It's analogous to
>> command-line options: "required option" is an oxymoron. I'm thinking
>> of changing the terminology in the spec from "attribute" to "option"
>> to help reinforce this.
(As I threatened here, I've changed terminology from "directive
attributes" to "directive options", in the docs and in the parser
code. See the latest CVS or snapshot.)
> OK, but this means sub-parsing the arguments instead of pulling data
> out of the attributes. My preference would be to have a standard
> way to specify not only the types of attributes, but default values
> for them. Then there are no optional attributes, just default or
> explicitly set. This is a more XML-ish way to go.
The concept of "optional" is useful. I'm using the XML idea of
#IMPLIED attributes, rather than defaults in the DTD (or in the
attribute parsing code, in our case). I've always found it more
flexible for the downstream parts of the processing chain to make the
decisions; keep your options open. If you put in default values
early, you lose the information that the attribute just *wasn't
specified*, and that can be valuable information lost.
Modelling directives on shell commands works well. Let's go with it.
>> Looking at the directives in the test1.rst file in order, first
>> there's::
>>
>> .. include:: file:test2.rst
>>
>> I'm not comfortable with URL syntax here. I really think it's
>> YAGNI, and may open up a big can of worms. So that one should
>> become::
>>
>> .. include:: test2.rst
>
> Well, I think we *do* need URIs if we want includes to be useful to
> ZReST (which is not file-system based, but lives in an
> object-oriented database accessible by URI). Obviously, we don't
> need a file: URI for such a trivial (and file-based) example, that's
> just a way to test that URIs work in general.
OK, good use case. But I'm still uncomfortable with the "openAny"
function you sent::
def openAny(path):
try:
# is it a file?
return open(path)
except :
try:
# is it a url?
return urlopen(path)
except (URLError, ValueError):
# treat as a string
return StringIO(path)
Especially the final StringIO part; the function should simply fail.
Should we check if "path" is a URL first, to avoid the "open(path)"
failure? Or is this a case of "look before you leap" vs. "it's easier
to ask forgiveness than permission"? Can a URI look like a filesystem
path? Seems a bit ambiguous to me.
Out of curiosity, what would a Zope/ZReST URL look like?
>> The next one is an "include"/"raw" hybrid::
>>
>> .. include:: test3.rst
>> :raw:
>> :format: html
>
> Well, there were a few possibilities tossed around in email. I
> couldn't remember why the format was even needed (it's *raw*, what
> else do we need?), but then my data wouldn't show up using html.py
> and I had to dig through the writer code to figure out that raw
> nodes are tested for format type.
Yes, it wouldn't do much good to insert raw HTML in the middle of PDF.
The "raw" directive is meant to be a solution of last resort anyhow;
it's not portable.
>> It should be::
>>
>> .. raw:: html test3.rst
>
> OK, if that's the correct way to do it in reST, I'll do that. I
> still prefer the default attributes method discussed above rather
> than multiple arguments because it makes the names and types of the
> argument/option/attribute explicit.
Discussed further below.
>> I noticed that you've got the attributes *after* the content in the
>> next ones::
>>
>> .. include::
>> This is a <super>Test</super> of the <sub>Emergency</sub>
>> <strike>Broadcasting</strike> "System"
>> :raw:
>> :format: html
>
> Yes, that was my mis-parsing of the directive (because I was working
> from images.py, not the spec).
Aha!
> The way I was grabbing (or failing to grab?) the content threw an
> exception if I put the attributes first, but worked OK if I put them
> after. I didn't like this either.
The spec is there for a reason. :-) But it's not immutable. It can be
changed when there's good reason. The code too; there's a *reason*
we're not at release 1.0 yet! This is a learning experience.
>> Attributes always come *before* the content (who's to say that
>> ":format: html" isn't valid raw data in some format?). In any case,
>> that directive should become::
>>
>> .. raw:: html
>
> OK. But the same arguments apply. Who's to say :format: html isn't
> valid raw data at the beginning of the data as well as easily as at
> the end? I agree that it's aethetically and functionally better to
> have the attributes first, just not because the string ":format:
> html" might appear in the data.
That's why the blank line is necessary between options and content.
> Also, the problem with using multiple comes up. Arguments to
> :raw: are
>
> path format
> or
> format
>
> But it seems more intuitive to me to put the required argument
> first: so the first argument to :raw: would always be format and we
> don't have to run tests or special cases. If there's a second
> argument, then it's path.
You mis-read. It *is* ".. raw:: format [path]" (the *second*
argument, "path", is optional).
> This still isn't as clear and explicit (to me) as using attributes,
> but better than having argument position be dependent on number of
> arguments.
It could be::
.. raw:: format
:source: path/URL
Nothing wrong with that, I suppose. Since it is an *optional*
argument, it does fit into the "option" mold. And in fact, it could
be even more explicit (and remove my misgivings at the same time) if
we made the option more specific::
.. raw:: format
:file: path
.. raw:: format
:url: URL
Although better option names may exist.
>> Next::
>>
>> .. raw::
>> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
>> :format: html
>>
>> Should become::
>>
>> .. raw:: html
>>
>> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
>>
>> (Note the blank line.)
>
> Oh, there's a blank line *before* the content. I didn't get that.
Yes. The directives.txt file was mistaken for the description of
"raw" (although if you didn't *read* it, that shouldn't have mattered
;-). It's fixed now, and the specs are much more explicit. See
http://docutils.sf.net/spec/rst/reStructuredText.html#directives, and
http://docutils.sf.net/spec/rst/directives.html.
> I thought the directive *ends* with a blank line.
There was some ambiguity about that, gone now. The rule is, a
directive block ends with the end of indentation. That's it. If a
directive doesn't need a content block, it should be empty, otherwise
it's an error.
> Yikes. Now I see why you want get_directive_content(). Doesn't
> that introduce all sorts of possible ambiguities? Shouldn't there
> be one format for directives in all cases so that the users (typing
> in raw text with reST getting in their way as little as possible)
> don't have to remember the special cases for different directives?
Yes and yes, and I came to the same conclusion, as discussed above.
Just "parse_directive()" will be sufficient.
>> The last "raw" directive was::
>>
>> .. raw::
>> :include: test3.rst
>> :format: html
>>
>> And should become::
>>
>> .. raw:: html test3.rst
>>
>> (This is the same as the second directive, therefore redundant.)
>
> The redundancy is deliberate. I was testing two different
> possibilities and wanted to be sure they both worked. They do the
> same thing, therefore they work.
Sorry, I should have said, "(This is *now* the same as the second
directive, therefore redundant.)". :-)
> So what you'd like to see is:
>
> :raw: [filepath] format
>
> :include: filepath (which must be reST)
>
> and functions for directives/__init__.py
>
> parse_directive()
>
> and
>
> get_directive_content()
>
> right?
Not quite. Just to be clear, let's summarize:
* ".. include:: filepath" (must be reStructuredText).
* ".. raw:: format" + either:
- a second, optional "filepath" argument, or
- a "source" (or equivalent) option, or
- (perhaps best?) *two* options, one for a filesystem path source,
the other for a URL source (and we can't use both in one
directive)
If the "external source" argument or option is specified (in
whatever form), there can be no directive content. If there is,
it's an error.
* A single function for directives/__init__.py, "parse_directive".
(Plus any auxiliary functions required, of course.)
* An exception for directives/__init__.py::
class DirectiveParseError(docutils.ApplicationError): pass
(It doesn't need an "__init__" method, but it doesn't hurt much.)
The signature for "parse_directive" could be something like this::
def parse_directive(match, type_name, state, state_machine,
option_presets, arguments=None,
option_spec={}, content=None):
"""
Parameters:
- `match`, `type_name`, state`, `state_machine`, and
`option_presets`: See `docutils.parsers.rst.directives.__init__`.
- `arguments`: A 2-tuple of the number of ``(required,
optional)`` whitespace-separated arguments to parse, or
``None`` if no arguments (same as ``(0, 0)``). If an
argument may contain whitespace (multiple words), specify
only one argument (either required or optional); the client
code must do any context-sensitive parsing.
- `option_spec`: A dictionary, mapping known option names to
conversion functions such as `int` or `float`. ``None`` or
an empty dict implies no options to parse.
- `content`: A boolean; true if content is allowed. Client
code must handle the case where content is required but not
supplied (an empty content list will be returned).
Returns a 4-tuple: list of arguments, dict of options, list of
strings (content block), and a boolean (blank finish).
Or raises `DirectiveParseError` with arguments: node (system
message), boolean (blank finish).
"""
Once "parse_directive" is ready, we'll be able to convert all existing
directives use it. As a side-effect, we will be able to drop the
"data" parameter from the directive function signature. The end
result will be much simpler directive code. Great result!
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Aahz <aa...@py...> - 2002-08-27 14:48:42
|
On Mon, Aug 26, 2002, David Goodger wrote: > > I don't mind at all. Actually, I prefer the discussions online; it > opens up the possibility for more input. this is relevant to me, but can't talk about it with my broken wrist. -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ |
|
From: David G. <go...@us...> - 2002-08-28 00:26:14
|
Aahz wrote:
> this is relevant to me, but can't talk about it with my broken
> wrist.
You have my sympathies. That must be affecting your book writing!
Hope you're back to normal soon.
Have you looked into one-handed keyboards? If it's your right hand
that's broken, there's a nifty half-keyboard here (designed for use
with a Palm, but works with anything I think):
http://www.aboutonehandtyping.com/bat.html
Also, there's the TouchStream keyboards that combine keyboard & mouse
in one unit. The TouchStream Mini might help:
http://www.fingerworks.com/overview.html
I'd like to get a TouchStream ST for myself, but can't afford it right
now.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Aahz <aa...@py...> - 2002-09-10 01:39:46
|
On Tue, Aug 27, 2002, David Goodger wrote: > Aahz wrote: >> >> this is relevant to me, but can't talk about it with my broken >> wrist. > > You have my sympathies. That must be affecting your book writing! No shit. :-( I'm up to 75% of my normal typing speed, but I can't type as much as normal. -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ |
|
From: Dethe E. <de...@ma...> - 2002-08-27 17:16:54
|
> And yes, you're right about returning a uniform set of data. But we
> actually need a 4-tuple; read on.
Yes, I originally wrote "4-tuple," then modified it to reflect what
you'd written about the spec.
> The concept of "optional" is useful. I'm using the XML idea of
> #IMPLIED attributes, rather than defaults in the DTD (or in the
> attribute parsing code, in our case). I've always found it more
> flexible for the downstream parts of the processing chain to make the
> decisions; keep your options open. If you put in default values
> early, you lose the information that the attribute just *wasn't
> specified*, and that can be valuable information lost.
I think that this complicates processing by the user (programmer) of a
directive. When searching for a value, we'll always have to test
whether the value is set or not. Note, that the #IMPLIED vs. DTD
default isn't what I'm arguing here--that's an implementation detail.
If you're parsing HTML or another known format and the spec says that
htmlOption.selected returns a boolean, you want to get a boolean whether
the option was set explicitly <option selected="false"/> or not
<option/>.
If all we want to know is, 'was this set?' we can initialize to false
(0) and pass the exists function ('def exists(arg): return 1). If we
need to know existence as well as value, then return a tuple (def
existsInt(arg): return (1, int(arg)). Again, this is explicit vs.
implicit, and it allows you to change things like:
if attributes.has_key('selected') and attributes['selected']:
else:
do_if_true()
else:
do_if_false()
to:
if attributes['selected']:
do_if_true()
else:
do_if_false()
Which looks cleaner to me. I guess there's some question of whether
this is a valid usecase. Personally, I would rather know that I can
access an attribute and get a value back consistently, without having to
test for it. I guess it depends on where and how often we need to know
whether a value has not been set at all.
> Modelling directives on shell commands works well. Let's go with it.
Does it? I find shell command options to be really difficult to
remember, after ten years of steady use. And shell commands are
inherently verbs, making their options adverbs, if you will. Documents
are inherently nouns, making their option/attributes adjective, to
stretch a metaphor.
> > Well, I think we *do* need URIs if we want includes to be useful to
> > ZReST (which is not file-system based, but lives in an
> > object-oriented database accessible by URI). Obviously, we don't
> > need a file: URI for such a trivial (and file-based) example, that's
> > just a way to test that URIs work in general.
>
> OK, good use case. But I'm still uncomfortable with the "openAny"
> function you sent::
>
> def openAny(path):
> try:
> # is it a file?
> return open(path)
> except :
> try:
> # is it a url?
> return urlopen(path)
> except (URLError, ValueError):
> # treat as a string
> return StringIO(path)
>
> Especially the final StringIO part; the function should simply fail.
> Should we check if "path" is a URL first, to avoid the "open(path)"
> failure? Or is this a case of "look before you leap" vs. "it's easier
> to ask forgiveness than permission"? Can a URI look like a filesystem
> path? Seems a bit ambiguous to me.
Sure. There's no need to have openAny handle strings if we're not
allowing the include directive to have a raw attribute. That was an
artifact of how I was interpreting include at the time.
> Out of curiosity, what would a Zope/ZReST URL look like?
Zope turns object references into URI paths. So a folder which lives at
http://myserver.com/myfolder
could have a document which lives at
http://myserver.com/myfolder/mydocument
which could be processed by a method edit, returning the document in an
edit form
http://myserver.com/myfolder/mydocument/edit
or maybe searched by xpath
http://myserver.com/myfolder/mydocument/xpath?/root/branch/leaf[@selected]
In other words, the URI paths are turned into object refernces in the
ZODB (Zope Object Database) and the default action is taken on them
(call, if it's a method, transform into HTML if it's data, etc.) It's a
rich and complex environment, with a bit too much magic going on behind
the scenes for my taste.
> > The way I was grabbing (or failing to grab?) the content threw an
> > exception if I put the attributes first, but worked OK if I put them
> > after. I didn't like this either.
>
> The spec is there for a reason. :-) But it's not immutable. It can be
> changed when there's good reason. The code too; there's a *reason*
> we're not at release 1.0 yet! This is a learning experience.
Yup. And I'm learning. I have read the spec, but I haven't
*internalized* it yet the way I have with, say, the DOM. So I'm
learning mostly from the examples of the code, and trying to keep to the
spec. But I know I'll still goof at this point (this is very much a
side project for me right now), so I'm very glad you're there for a
sanity check.
> > But it seems more intuitive to me to put the required argument
> > first: so the first argument to :raw: would always be format and we
> > don't have to run tests or special cases. If there's a second
> > argument, then it's path.
>
> You mis-read. It *is* ".. raw:: format [path]" (the *second*
> argument, "path", is optional).
Oops. My bad.
> > This still isn't as clear and explicit (to me) as using attributes,
> > but better than having argument position be dependent on number of
> > arguments.
>
> It could be::
>
> .. raw:: format
> :source: path/URL
>
> Nothing wrong with that, I suppose. Since it is an *optional*
> argument, it does fit into the "option" mold. And in fact, it could
> be even more explicit (and remove my misgivings at the same time) if
> we made the option more specific::
>
> .. raw:: format
> :file: path
>
> .. raw:: format
> :url: URL
Hey, I like it.
> > Yikes. Now I see why you want get_directive_content(). Doesn't
> > that introduce all sorts of possible ambiguities? Shouldn't there
> > be one format for directives in all cases so that the users (typing
> > in raw text with reST getting in their way as little as possible)
> > don't have to remember the special cases for different directives?
>
> Yes and yes, and I came to the same conclusion, as discussed above.
> Just "parse_directive()" will be sufficient.
Cool.
> * ".. include:: filepath" (must be reStructuredText).
>
> * ".. raw:: format" + either:
>
> - a second, optional "filepath" argument, or
> - a "source" (or equivalent) option, or
> - (perhaps best?) *two* options, one for a filesystem path source,
> the other for a URL source (and we can't use both in one
> directive)
There's actually a usecase for including both. It's the same as using
both PUBLIC and SYSTEM identifiers for a DTD. Basically, it says, get
it from this URL if available, or this file if you can't get to the URL
for some reason. A fallback option, in other words.
> If the "external source" argument or option is specified (in
> whatever form), there can be no directive content. If there is,
> it's an error.
>
> * A single function for directives/__init__.py, "parse_directive".
> (Plus any auxiliary functions required, of course.)
>
> * An exception for directives/__init__.py::
>
> class DirectiveParseError(docutils.ApplicationError): pass
>
> (It doesn't need an "__init__" method, but it doesn't hurt much.)
>
> The signature for "parse_directive" could be something like this::
>
> def parse_directive(match, type_name, state, state_machine,
> option_presets, arguments=None,
> option_spec={}, content=None):
> """
> Parameters:
>
> - `match`, `type_name`, state`, `state_machine`, and
> `option_presets`: See `docutils.parsers.rst.directives.__init__`.
> - `arguments`: A 2-tuple of the number of ``(required,
> optional)`` whitespace-separated arguments to parse, or
> ``None`` if no arguments (same as ``(0, 0)``). If an
> argument may contain whitespace (multiple words), specify
> only one argument (either required or optional); the client
> code must do any context-sensitive parsing.
> - `option_spec`: A dictionary, mapping known option names to
> conversion functions such as `int` or `float`. ``None`` or
> an empty dict implies no options to parse.
> - `content`: A boolean; true if content is allowed. Client
> code must handle the case where content is required but not
> supplied (an empty content list will be returned).
>
> Returns a 4-tuple: list of arguments, dict of options, list of
> strings (content block), and a boolean (blank finish).
>
> Or raises `DirectiveParseError` with arguments: node (system
> message), boolean (blank finish).
> """
>
> Once "parse_directive" is ready, we'll be able to convert all existing
> directives use it. As a side-effect, we will be able to drop the
> "data" parameter from the directive function signature. The end
> result will be much simpler directive code. Great result!
Looks great. Thanks for the comments and the help.
--Dethe
|
|
From: David G. <go...@us...> - 2002-08-28 00:30:25
|
Dethe Elza wrote:
>> The concept of "optional" is useful. I'm using the XML idea of
>> #IMPLIED attributes, rather than defaults in the DTD (or in the
>> attribute parsing code, in our case). I've always found it more
>> flexible for the downstream parts of the processing chain to make the
>> decisions; keep your options open. If you put in default values
>> early, you lose the information that the attribute just *wasn't
>> specified*, and that can be valuable information lost.
>
> I think that this complicates processing by the user (programmer) of
> a directive. When searching for a value, we'll always have to test
> whether the value is set or not. Note, that the #IMPLIED vs. DTD
> default isn't what I'm arguing here--that's an implementation
> detail. If you're parsing HTML or another known format and the spec
> says that htmlOption.selected returns a boolean, you want to get a
> boolean whether the option was set explicitly <option
> selected="false"/> or not <option/>.
But directives are *not* XML tags. I think that the now-obsolete
"attribute" terminology was misleading. Calling them "options" is
much better. Let's drop the name "attribute" altogether.
> If all we want to know is, 'was this set?' we can initialize to
> false (0) and pass the exists function ('def exists(arg): return
> 1). If we need to know existence as well as value, then return a
> tuple (def existsInt(arg): return (1, int(arg)). Again, this is
> explicit vs. implicit, and it allows you to change things like:
>
> if attributes.has_key('selected') and attributes['selected']:
> else:
> do_if_true()
> else:
> do_if_false()
>
> to:
>
> if attributes['selected']:
> do_if_true()
> else:
> do_if_false()
>
> Which looks cleaner to me. I guess there's some question of whether
> this is a valid usecase. Personally, I would rather know that I can
> access an attribute and get a value back consistently, without
> having to test for it. I guess it depends on where and how often we
> need to know whether a value has not been set at all.
I first thought up a hypothetical use case counterargument, but it
wasn't convincing:
Say there's a global-impact directive with an option which sets
some persistent parameter, and this directive can occur multiple
times. Subsequent occurrences should use the previous directive's
option settings by default (i.e., don't override the persistent
parameter unless the option was explicitly set). The second time
the directive occurs, if there's a default value for the option,
there's no way to know that the option wasn't explicitly set, and
the default from the first directive's option will be lost.
Perhaps all we have to do is to use an invalid default value (like
``None``), and check for that before resetting the persistent
parameter.
You almost had me convinced. But after looking at the existing
directive code, I saw a problem. Directives that have options, like
"image" or "contents", use the options dictionary returned by
"docutils.utils.extract_extension_options" to update another
dictionary, either "option_presets" or "pending.details". In the case
of "image", the "option_presets" parameter contains an "alt" entry, a
filename to be used as the last-resort default for the "alt" option.
Setting default option values for unspecified options means that the
dict.update code would have to become very nasty. *That's* where the
lack of defaults pays off: a ``presets.update(options)`` operation
doesn't clobber legitimate presets with bogus defaults.
I don't find testing for the existence of dictionary keys particularly
onerous. And in this case, the solution seems to be much worse than
the problem.
Unless there's another solution?
>> Modelling directives on shell commands works well. Let's
>> go with it.
>
> Does it?
For me, yes. Can't speak for everybody. :-)
But at least it's an established standard that many people are
familiar with. And directives *are* commands; they're commands to the
parser from inside the document. Again, they're *not* XML elements.
> I find shell command options to be really difficult to remember,
> after ten years of steady use.
Details are always hard to remember. I don't know that we can do more
than standardize the directive interface (into arguments, options, and
content, which we're doing now), choose good names for options, and
document it all well. Directives are always going to be "power user"
tools.
> And shell commands are inherently verbs, making their options
> adverbs, if you will. Documents are inherently nouns, making their
> option/attributes adjective, to stretch a metaphor.
I'd say that directives are verbs too, although most are named after
nouns. The directive syntax should be read as "do X" or "make an X".
But the syntax is easily ignored.
>> Out of curiosity, what would a Zope/ZReST URL look like?
...
> In other words, the URI paths are turned into object refernces in
> the ZODB (Zope Object Database) and the default action is taken on
> them (call, if it's a method, transform into HTML if it's data,
> etc.) It's a rich and complex environment, with a bit too much
> magic going on behind the scenes for my taste.
I see. Thanks for the explanation!
>> * ".. raw:: format" + either:
>>
>> - a second, optional "filepath" argument, or
>> - a "source" (or equivalent) option, or
>> - (perhaps best?) *two* options, one for a filesystem path source,
>> the other for a URL source (and we can't use both in one
>> directive)
>
> There's actually a usecase for including both. It's the same as
> using both PUBLIC and SYSTEM identifiers for a DTD. Basically, it
> says, get it from this URL if available, or this file if you can't
> get to the URL for some reason. A fallback option, in other words.
I don't know if we're ever "gonna need it", but go ahead if you're
keen.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Dethe E. <de...@ma...> - 2002-08-28 17:25:17
|
[snipped a bunch of to and fro about default arguments] > You almost had me convinced. But after looking at the existing > directive code, I saw a problem. Directives that have options, like > "image" or "contents", use the options dictionary returned by > "docutils.utils.extract_extension_options" to update another > dictionary, either "option_presets" or "pending.details". In the case > of "image", the "option_presets" parameter contains an "alt" entry, a > filename to be used as the last-resort default for the "alt" option. > Setting default option values for unspecified options means that the > dict.update code would have to become very nasty. *That's* where the > lack of defaults pays off: a ``presets.update(options)`` operation > doesn't clobber legitimate presets with bogus defaults. But in this case, aren't the presets the defaults? Wouldn't making the presets act specifically as defaults make the code simpler rather than more complicated? Or am I just not getting it? Sorry to be so blockheaded on such a relatively trivial matter. > But at least it's an established standard that many people are > familiar with. And directives *are* commands; they're commands to the > parser from inside the document. Again, they're *not* XML elements. No, they're XML Processing Instructions <0.5 wink> --Dethe |
|
From: David G. <go...@us...> - 2002-08-29 00:07:45
|
[David]
>> You almost had me convinced. But after looking at the existing
>> directive code, I saw a problem. Directives that have options,
>> like "image" or "contents", use the options dictionary returned by
>> "docutils.utils.extract_extension_options" to update another
>> dictionary, either "option_presets" or "pending.details". In the
>> case of "image", the "option_presets" parameter contains an "alt"
>> entry, a filename to be used as the last-resort default for the
>> "alt" option. Setting default option values for unspecified
>> options means that the dict.update code would have to become very
>> nasty. *That's* where the lack of defaults pays off: a
>> ``presets.update(options)`` operation doesn't clobber legitimate
>> presets with bogus defaults.
[Dethe]
> But in this case, aren't the presets the defaults?
Yes, but only if there *are* presets (they're rare), and only if
defaults *always* make sense (they don't). Take a look at a
substitution definition which uses an "image" directive::
.. |symbol| image:: symbol.png
:height: 50
:width: 100
This will produce the following pseudo-XML::
<substitution_definition name="symbol">
<image alt="symbol" height="50" uri="symbol.png" width="100">
The "alt" attribute comes from the substitution name (bracketed by
"|"). With a straight "image" directive (not in a substitution
definition), what should the default "alt" option be? We rejected the
URL as a default long ago. The only default that would make sense
would be some form of "no value", like ``None``. But then the code
would have to special-case a ``None`` value, removing the option at
some point. Seems like a lot more work than checking for dictionary
keys before accessing them.
And what if there are *no* options? ::
.. |symbol| image:: symbol.png
What should the default values be for the "height" and "width"
options? The directive code would have to go through the options
dictionary and remove any that don't make sense (like ``{'height':
None, 'width': None}``). Here's a perfect example of "not specified"
being different from "default value".
> Wouldn't making the presets act specifically as defaults make the
> code simpler rather than more complicated? Or am I just not getting
> it?
When you began this thread, I had a "won't work" feeling, but didn't
know exactly why. During the thread, you almost convinced me twice.
But the feeling never went away and every time I examined the existing
code and examples, I've rediscovered cases where it would be a lot
*more* work to go the defaults route. It just seems so much simpler
the way it is, and I don't see much benefit from using defaults. Have
I convinced you yet? :-)
If not, please code up a solution using defaults and convince me with
proof, not words. ;-)
> Sorry to be so blockheaded on such a relatively trivial matter.
Not at all. Such questions help us examine assumptions that were
made, often without sufficient initial thought. Why did I implement
this the way I did? Straight answer: it made sense at the time. But
now I have to justify the decision. Sometimes examining the decision
in detail invalidates it, sometimes it reinforces its validity.
Either way, asking the questions benefits the project.
For example, Dmitry's "sectnum" directive/transform hinted at a
weakness in the transform priority system. The weakness was fully
exposed by the "target-notes" directive/transform I implemented for
PEPs. It just goes to show that we can't cling to our code or ideas;
we have to be willing to throw them away if and when they're shown to
be deficient in some way. But that's not always easy to do.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|