|
From: Jason D. <ja...@in...> - 2002-12-27 22:36:41
|
Hello. I'm trying to add support to the html4css1 writer for acronyms. I want interpreted text that matches a known set of acronyms to output an <acronym> element instead of the default <span class="interpreted"> element it outputs now. For example, `reST` should output <acronym title="reStructuredText">reST</acronym>. I'm new to the docutils source code so was wondering what the best way to do this would be. I thought I'd add an --acronyms-file option to the writer, load the acronyms and their titles out of that file, and then check to see if the text for an interpreted node was one of those known acronyms. Is this an appropriate approach or should I be looking at implementing it via a transform? This makes sense to me since the acronym element I'm trying to output is specific to HTML. Thanks, -- Jason Diamond <ja...@in...> |
|
From: David G. <go...@py...> - 2002-12-29 17:03:30
|
Jason Diamond wrote: > I'm trying to add support to the html4css1 writer for acronyms. I want > interpreted text that matches a known set of acronyms to output an > <acronym> element instead of the default <span class="interpreted"> > element it outputs now. For example, `reST` should output <acronym > title="reStructuredText">reST</acronym>. > > I'm new to the docutils source code so was wondering what the best way > to do this would be. I thought I'd add an --acronyms-file option to the > writer, load the acronyms and their titles out of that file, and then > check to see if the text for an interpreted node was one of those known > acronyms. Is this an appropriate approach or should I be looking at > implementing it via a transform? This makes sense to me since the > acronym element I'm trying to output is specific to HTML. There isn't a lot of support for interpreted text in Docutils yet. It has been a "future expansion" feature, but it looks like the future has arrived. The main application of interpreted text has been the Python Source Reader, which is making slow progress. See PEP 258 (<http://docutils.sf.net/spec/pep-0258.html>) and <http://docutils.sf.net/spec/pysource.html#interpreted-text> for details. A quick & dirty way to implement what you want would be to indicate the role of each acronym like this: "`reST`:acronym:" or "`reST`:a:". This will put the role into a doctree node attribute, which is easy to check for in code. But the text is butt-ugly and this approach become obsolete (read on). From <http://docutils.sf.net/spec/notes.html#restructuredtext-parser>: Alan Jaffray suggested (and I agree) that it would be sensible to: - have a directive to specify a default role for interpreted text - allow the reST processor to take an argument for the default role - issue a warning when processing documents with no default role which contain interpreted text with no explicitly specified role (I just added "and/or command-line option" after "directive".) An application (or document or processing run) could specify a default role, so a ":role:" prefix or suffix wouldn't be required; plain `backquotes` would be sufficient. Ideally and eventually, the "interpreted" element will disappear from the Docutils doctree. In its place will be a customizable set of inline elements including "acronym" and "index_entry", directly created by the parser. I won't be able to work on this for at least a week. If you're interested in helping out, please do! If anything is unclear (and I'm sure there's lots), please ask. -- David Goodger <go...@py...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Jason D. <ja...@in...> - 2003-01-03 20:32:48
|
On Sun, 2002-12-29 at 09:04, David Goodger wrote: > A quick & dirty way to implement what you want would be to indicate the role > of each acronym like this: "`reST`:acronym:" or "`reST`:a:". This will put > the role into a doctree node attribute, which is easy to check for in code. > But the text is butt-ugly and this approach become obsolete (read on). This is what I did. It is ugly but not as much as in HTML. At least the titles can be defined elsewhere. > An application (or document or processing run) could specify a default role, > so a ":role:" prefix or suffix wouldn't be required; plain `backquotes` > would be sufficient. > > Ideally and eventually, the "interpreted" element will disappear from the > Docutils doctree. In its place will be a customizable set of inline > elements including "acronym" and "index_entry", directly created by the > parser. How will that work? Will acronyms and index entries use a different syntax to differentiate themselves from each other? Are are you saying the default role will specify that all interpreted text is a certain type of element? What if you wanted to identify both acronyms and index entries in the same document? What if an acronym was also an index entry? Jason |
|
From: David G. <go...@py...> - 2003-01-04 01:11:52
|
[David Goodger]
>> A quick & dirty way to implement what you want would be to indicate
>> the role of each acronym like this: "`reST`:acronym:" or
>> "`reST`:a:". This will put the role into a doctree node attribute,
>> which is easy to check for in code. But the text is butt-ugly and
>> this approach become obsolete (read on).
^
I meant to say "*may* become obsolete"
[Jason Diamond]
> This is what I did. It is ugly but not as much as in HTML. At least
> the titles can be defined elsewhere.
(By "titles" I assume you mean roles [as in `interpreted text`:role:].
Correct? If not, please explain what you do mean.)
That's fine, but please be aware that the current internal
representation will probably be ripped out and replaced before long.
>> An application (or document or processing run) could specify a
>> default role, so a ":role:" prefix or suffix wouldn't be required;
>> plain `backquotes` would be sufficient.
>>
>> Ideally and eventually, the "interpreted" element will disappear
>> from the Docutils doctree. In its place will be a customizable set
>> of inline elements including "acronym" and "index_entry", directly
>> created by the parser.
>
> How will that work? Will acronyms and index entries use a different
> syntax to differentiate themselves from each other?
No, they'll still use interpreted text syntax. I meant that
internally, acronyms will be <acronym> elements and index entries will
be <index_entry> elements; there will be no <interpreted role="...">
elements. I don't plan on adding new syntax.
> Are are you saying the default role will specify that all
> interpreted text is a certain type of element?
Yes, all interpreted text *without an explicit role* will take on the
default implicit role. There will be one default interpreted text
role per run. The most-used role should become the default.
I don't know how this will be implemented yet though.
> What if you wanted to identify both acronyms and index entries in
> the same document?
If acronyms were the default role, then index entries would have to be
marked up explicitly. Or vice-versa. If the default role was neither
of these, they would both have to be marked up explicitly.
> What if an acronym was also an index entry?
That's a tough one. We're approaching the limitations of
reStructuredText now. Inline markup cannot be nested, so you would
either have to choose one of the roles, or make up a new combined
"acronym-index-entry" role. The latter case could easily become quite
complicated.
I suppose the interpreted text syntax *could* be extended for multiple
roles, something like this::
`interpreted text`:role1,role2:
But at first glance that seems exceedingly ugly to me. Basically, if
a document is so complex as to require nested inline markup, perhaps
another markup system should be considered. reStructuredText does not
have the flexibility of XML (by design).
--
David Goodger <go...@py...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Jason D. <ja...@in...> - 2003-01-04 04:22:52
|
On Fri, 2003-01-03 at 17:13, David Goodger wrote:
> [David Goodger]
> >> A quick & dirty way to implement what you want would be to indicate
> >> the role of each acronym like this: "`reST`:acronym:" or
> >> "`reST`:a:". This will put the role into a doctree node attribute,
> >> which is easy to check for in code. But the text is butt-ugly and
> >> this approach become obsolete (read on).
> ^
> I meant to say "*may* become obsolete"
>
> [Jason Diamond]
> > This is what I did. It is ugly but not as much as in HTML. At least
> > the titles can be defined elsewhere.
>
> (By "titles" I assume you mean roles [as in `interpreted text`:role:].
> Correct? If not, please explain what you do mean.)
I want to translate "`reST`:acronym:" into "<acronym
title='reStructuredText'>reST</acronym>". The value of the title
attribute has to be defined out-of-band since you can't parameterize
interpreted text. Right now I have them in a separate file but I'm
experimenting with creating a directive that will use some form of reST
syntax to let you define them.
> That's fine, but please be aware that the current internal
> representation will probably be ripped out and replaced before long.
So instead of an `interpreted` node, I'll have an `acronym` node? I'll
be moving my code to `visit_acronym` and `depart_acronym`?
> >> An application (or document or processing run) could specify a
> >> default role, so a ":role:" prefix or suffix wouldn't be required;
> >> plain `backquotes` would be sufficient.
> >>
> >> Ideally and eventually, the "interpreted" element will disappear
> >> from the Docutils doctree. In its place will be a customizable set
> >> of inline elements including "acronym" and "index_entry", directly
> >> created by the parser.
> >
> > How will that work? Will acronyms and index entries use a different
> > syntax to differentiate themselves from each other?
>
> No, they'll still use interpreted text syntax. I meant that
> internally, acronyms will be <acronym> elements and index entries will
> be <index_entry> elements; there will be no <interpreted role="...">
> elements. I don't plan on adding new syntax.
But you're letting people create new node types just by making up new
role names? Won't that make defining the DTD difficult? I like the
interpreted/role approach. It would be nice if the walk/walkabout
methods looked for the role attribute on intrepreted nodes, though. They
could invoke `visit_interpreted_acronym` or
`visit_interpreted_index_entry` depending on the role attribute and fall
back to `visit_interpreted` in case the extended method doesn't exist.
> > What if an acronym was also an index entry?
>
> That's a tough one. We're approaching the limitations of
> reStructuredText now. Inline markup cannot be nested, so you would
> either have to choose one of the roles, or make up a new combined
> "acronym-index-entry" role. The latter case could easily become quite
> complicated.
>
> I suppose the interpreted text syntax *could* be extended for multiple
> roles, something like this::
>
> `interpreted text`:role1,role2:
>
> But at first glance that seems exceedingly ugly to me. Basically, if
> a document is so complex as to require nested inline markup, perhaps
> another markup system should be considered. reStructuredText does not
> have the flexibility of XML (by design).
How about this::
`interpreted text`:role1:role2:
Could we parameterize roles like this::
`interpreted text`:role1(foo=bar):role2(baz=quux):
Yes, it's ugly. Just a different kind of ugly compared to XML.
The reason I was thinking about parameters (or attributes) was to
"override" a defined acronym::
`CSS`:acronym:
could produce::
<acronym title="Cascading Style Sheets">CSS</acronym>
by default but::
`CSS`:acronym(title=Content Scrambling System):
could produce::
<acronym title="Content Scrambling System">CSS</acronym>
when writing about DVDs and copy protection.
I'm not really pushing for these--just thought I'd throw them out there
to see what you might think. I'd like to see reST stay simple but still
be powerful enough to get most of what I'd like to do done.
Jason
|
|
From: David G. <go...@py...> - 2003-01-04 15:10:48
|
Jason Diamond wrote:
> I want to translate "`reST`:acronym:" into "<acronym
> title='reStructuredText'>reST</acronym>". The value of the title
> attribute has to be defined out-of-band since you can't parameterize
> interpreted text. Right now I have them in a separate file but I'm
> experimenting with creating a directive that will use some form of reST
> syntax to let you define them.
I understand. Sounds reasonable. What happens when an attribute without a
lookup table entry comes along?
> So instead of an `interpreted` node, I'll have an `acronym` node? I'll
> be moving my code to `visit_acronym` and `depart_acronym`?
Yes.
> But you're letting people create new node types just by making up new
> role names?
No, not arbitrarily. Authors would have to choose from a pre-determined set
of roles, each having pre-existing software support. For instance, your
acronym example would have to have support in the parser, to create the
"acronym" elements and associate "title" attributes from a lookup table.
Interpreted text with unknown roles would generate errors.
> Won't that make defining the DTD difficult?
It would be impossible, which is a good indication that it's a bad approach
;).
> I like the interpreted/role approach. It would be nice if the walk/walkabout
> methods looked for the role attribute on intrepreted nodes, though. They could
> invoke `visit_interpreted_acronym` or `visit_interpreted_index_entry`
> depending on the role attribute and fall back to `visit_interpreted` in case
> the extended method doesn't exist.
Interesting idea. In the scheme I'm considering though, there won't be any
"interpreted" elements left in the doctree by the time the Writer sees it.
Interpreted text is just syntax; the semantics haven't been finalized or
implemented yet. The current "interpreted" element is merely a placeholder
for testing purposes and will disappear. It's like the directive syntax.
Originally there was a generic "directive" element (before there was support
for directive processing in the parser), but it's gone now. It's the same
situation for interpreted text.
> How about this::
>
> `interpreted text`:role1:role2:
>
> Could we parameterize roles like this::
>
> `interpreted text`:role1(foo=bar):role2(baz=quux):
>
> Yes, it's ugly. Just a different kind of ugly compared to XML.
These are possibilities, and I'll note them. But they're too complex for my
liking. I really want to avoid complex inline markup.
> The reason I was thinking about parameters (or attributes) was to
> "override" a defined acronym::
>
> `CSS`:acronym:
>
> could produce::
>
> <acronym title="Cascading Style Sheets">CSS</acronym>
>
> by default but::
>
> `CSS`:acronym(title=Content Scrambling System):
>
> could produce::
>
> <acronym title="Content Scrambling System">CSS</acronym>
>
> when writing about DVDs and copy protection.
My first inclination would be to use the existing substitution/directive
mechanism for something like this::
`CSS`:acronym: is used for HTML, and |CSS| is used for DVDs.
.. |CSS| acronym:: Content Scrambling System
> I'm not really pushing for these--
Good!
> just thought I'd throw them out there to see what you might think.
The effort is appreciated.
> I'd like to see reST stay simple but still
> be powerful enough to get most of what I'd like to do done.
I don't mind if directives are used for the gnarly stuff, but I'm loathe to
reduce readability by adding inline markup complexity.
--
David Goodger <go...@py...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: David G. <go...@py...> - 2003-01-04 15:23:14
|
I wrote:
> What happens when an attribute without a lookup table entry comes along?
^^^^^^^^^ acronym
-- David Goodger <go...@py...>
|
|
From: Aahz <aa...@py...> - 2003-01-05 02:40:46
|
On Sat, Jan 04, 2003, David Goodger wrote:
> Jason Diamond wrote:
>>
>> But you're letting people create new node types just by making up new
>> role names?
>
> No, not arbitrarily. Authors would have to choose from a
> pre-determined set of roles, each having pre-existing software
> support. For instance, your acronym example would have to have
> support in the parser, to create the "acronym" elements and associate
> "title" attributes from a lookup table.
>
> Interpreted text with unknown roles would generate errors.
-1 unless it's easier than the current system for adding new directives.
Currently to handle a directive you need support in both parser and
writer, but for interpreted text you only need support in the writer.
While I understand what you want to do (and it would make certain parts
of what I want to do with interpreted text easier), I'm concerned about
adding complexity.
Because I know you're going to ask ;-), the current use I'm making of
interpreted text is chapter/figure/list references::
To understand better how mutable and immutable objects work, see
code listings :list:`mutable.py` and :list:`immutable.py`, as well
as figures :figure:`mutable.eps` and :figure:`immutable.eps`.
You'll also want to read :chapter:`Data`.
--
Aahz (aa...@py...) <*> http://www.pythoncraft.com/
"There are three kinds of lies: Lies, Damn Lies, and Statistics." --Disraeli
|
|
From: David G. <go...@py...> - 2003-01-05 05:45:40
|
[Jason Diamond]
>>> But you're letting people create new node types just by making up new
>>> role names?
[David Goodger]
>> No, not arbitrarily. Authors would have to choose from a
>> pre-determined set of roles, each having pre-existing software
>> support. For instance, your acronym example would have to have
>> support in the parser, to create the "acronym" elements and associate
>> "title" attributes from a lookup table.
>>
>> Interpreted text with unknown roles would generate errors.
[Aahz]
> -1 unless it's easier than the current system for adding new directives.
Ease of implementation isn't even on the radar at this point. Correctness
is dead center. For most interpreted text processing, it is *wrong* to
delay processing (interpretation) until the Writer. It is *absolutely
correct* to handle the processing (at least the initial stage) in the
Parser.
I apologize that I can't explain interpreted text any better than I already
have, in the markup spec (spec/rst/reStructuredText.txt), PEP 258,
spec/pysource.txt, previous posts, and this one (I *was* planning to go to
bed early tonight... so please read carefully). While I'm sure my mental
model will have to adapt as the implementation unfolds (it's a process of
discovery), I'm pretty confident of its foundation.
The current state of the implementation is a red herring. It's going to
change, because it's a half-assed, incomplete, and totally *wrong*
implementation.
But feel free to try to convince me otherwise. :)
> Currently to handle a directive you need support in both parser and
> writer, but for interpreted text you only need support in the writer.
You only need Writer support for a directive if the directive introduces a
new element into the DTD. Some do: the directives whose only reason for
being is to allow the creation of these elements (like "image" and "meta").
Most interesting ones don't though (like "contents" or "include"); they just
insert standard elements into the doctree.
In most cases there's nothing Writer-specific about interpreted text.
Interpreted text is entirely a reStructuredText markup construct, a way to
get around built-in limitations of the medium. No other form of markup
would require anything resembling an "<interpreted>" element.
> While I understand what you want to do (and it would make certain parts
> of what I want to do with interpreted text easier), I'm concerned about
> adding complexity.
If an author wants an "acronym" element, they should get one. A real
acronym element, not an <interpreted role="acronym"> surrogate. If we allow
the surrogate and take that line of reasoning to the extreme, all we'd need
for a DTD is this bogosity ::
<!ELEMENT element (element|PCDATA)*>
<!ATTLIST element
name NMTOKEN #REQUIRED
attlist CDATA #IMPLIED>
(Which BTW is essentially what DOM is, but it has good reason.)
My reasoning is that all supported interpreted text roles must be known by
the Parser. There's no guarantee that an arbitrary role will be supported
by the eventual Writer. Adding a new role is tantamount to adding a new
element to the DTD, may require extensive support, and shouldn't be taken
lightly. However, there should be a limited number of such roles. Allowing
different Writers to support different elements would fragment the system
beyond repair. It must remain a unified whole. The only place where
variation is acceptable is at the start, at the Reader/Parser interface
(with possible transforms inserted *by* the Reader). Once past the
Transformer, no variation from standard Docutils doctree is possible.
On the other hand, most new elements won't require Writer support, because
they will be Reader-specific extensions to the DTD, and will be converted to
standard elements by Reader-specific transforms. Case in point: the Python
Source Reader, which will use interpreted text extensively. The default
role will be "Python identifier", which will be further interpreted by
namespace context into <class>, <method>, <module>, <attribute>, etc.
elements (see spec/pysource.dtd), which will be transformed into standard
hyperlink references, which will be processed by the various Writers. The
point is that no Writer will need to have any knowledge of the Python-Reader
origin of these elements.
(I'm sorry if this gives you a sinking feeling, like "all my work is going
to waste". Rest assured, it won't. If you check in your code into your
sandbox, all the good ideas will eventually migrate into the main tree. The
eventual form these ideas take may be very different from the current form
though.)
> Because I know you're going to ask ;-), the current use I'm making of
> interpreted text is chapter/figure/list references::
>
> To understand better how mutable and immutable objects work, see
> code listings :list:`mutable.py` and :list:`immutable.py`, as well
> as figures :figure:`mutable.eps` and :figure:`immutable.eps`.
> You'll also want to read :chapter:`Data`.
Thanks for answering without me having to ask. :) The obvious follow-up
questions are: how should all of that be processed and rendered? What is
the purpose and semantics? Why not just use standard hyperlink references?
(I could guess, but I'd rather not.)
I hope this all makes sense. Good night!
--
David Goodger <go...@py...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|