[Docutils-develop] RE: dockbook writer

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi David,

As you mentioned you'd like these posted, I've sent this to the
docutils-develop list.

I hope that's okay.

Thanks very much for taking the time to answer all my questions so
thoroughly.

My responses are all inline below.

> -----Original Message-----
> From: David Goodger [mailto:go...@us...]
> Sent: Tuesday, June 18, 2002 8:17 AM
> To: Oliver Rutherfurd
> Subject: Re: dockbook writer
> 
> 
> Here's the answers to the questions you posed.  I'd still like to put 
> them on a mailing list (first part on Doc-SIG, second on 
> Docutils-develop perhaps).
> 
> > The biggest holes are `option` and `field` lists -- I
> didn't know how
> > to map those to docbook elements, so I steered clear of them.
> 
> There may not be direct equivalents (they may be Docutils 
> innovations), so it may be necessary to construct an equivalent using 
> DocBook elements as "primitives".  Similarly for HTML output, when a 
> closely corresponding element is not available, we have to treat 
> DocBook as a display format (unfortunately).
> 
> Having scoured the DocBook reference, I can't find a close equivalent 
> for option lists.  The Docutils "option" can be built up with DocBook 
> "option" and "replaceable" elements.
> The Docutils "option_list" may best be built with a DocBook 
> "table" or "variablelist".  Here's a pertinent example, 
> "Generating a man page": 
> http://www.tldp.org/HOWTO/mini/DocBook-Install/using.html#AEN600

Since my original message, I'd gone ahead and done this using "option", 
"replaceable", and "table" elements.  I'll leave it like that for now, 
unless people think "variablelist" is more appropriate.

> The situation is similar for field lists.  Tables would seem 
> appropriate, since field lists are essentially the equivalent of 
> database records.  But "variablelist" or "glosslist" may be 
> preferable.

I went with "variablelist" here.

> > > You could add command-line options to choose the
> top-level element
> > > of the DocBook output.  Articles and chapters contain
> sections, and
> > > books contain chapters.
> >
> > I've done what you suggested -- that was the approach I'd started 
> > with, but if `book` is the root element, chapters are not correctly 
> > handled.
> > 
> > I was thinking if `book` is the root element, then all first level 
> > sections could be chapters -- but I haven't done that yet.
> 
> I've been thinking about this from the other end: the source files. 
> When writing a book, the author probably wants to split it up into 
> files, perhaps one per chapter (but perhaps even more detailed). 
> However, we'd like to be able to have references from one chapter to 
> another, and have continuous numbering (pages and chapters, as 
> applicable).  Of course, none of this is implemented yet.  There has 
> been some thought put into some aspects; see
> http://docutils.sf.net/spec/notes.html#misc-include and 
> http://docutils.sf.net/spec/notes.html#reference-merging.
> 
> When I was working with SGML in Japan, we had a system where there was

> a top-level coordinating file, book.sgml, which contained the 
> top-level structure of a book: the <book> element, containing the book

> <title> and empty component elements (<preface>, <chapter>, 
> <appendix>, etc.), each with filename attributes pointing to the 
> actual source for the component.  Something like this::
> 
>     <book id="bk01">
>     <title>Title of the Book</title>
>     <preface inrefid="pr01"></preface>
>     <chapter inrefid="ch01"></chapter>
>     <chapter inrefid="ch02"></chapter>
>     <chapter inrefid="ch03"></chapter>
>     <appendix inrefid="ap01"></appendix>
>     </book>
> 
> The processing system would process each component separately, but it 
> would recognize and use the book file to coordinate chapter and page 
> numbering, and keep a persistent ID -> (title, page number) mapping 
> database for cross-references.  Docutils could use a similar system 
> for large-scale, multipart documents.  Aahz is beginning a book
> project with Docutils, so this would be immediately applicable.
> 
> Please don't worry, everyone; I'm *not* advocating that implementing 
> this should take top priority.  Just putting out some food for 
> thought.  As always, we'll implement what we need, as the need becomes

> apparent, and no more.
>
>
> > Also, I was thinking that titles preceeding elements like
> tables could
> > be treated as title for that element -- as DocBook seems to
> want them.
> > That's one thing that I've wanted to be able to do for code samples 
> > (in literal blocks), but haven't seen that one can yet. I
> saw that the
> > figure directive can do this, but that seems only for images.
> 
> We can't use ordinary section headers for table etc. element titles, 
> because they define the section structure.  It would be ambiguous: if 
> a header is immediately followed by a table, should the title go to 
> the table or should it begin a section?  Some other mechanism is 
> needed for a titled element; perhaps a directive.  Any ideas?

I'd been afraid of that ;-). I'd added it but have since removed it.

How about something like?

::

  [Table Title]

  +-----+-----+-----+
  | one | two | ... |
  +=====+=====+=====+
  | abc | def | ghi |
  +-----+-----+-----+

> In most cases, DocBook has an untitled "informal" version of the 
> element.  In this case, use "informaltable".
> 
> Answers to other questions in the code, & comments about the code:
> 
> - "QUESTION: if using `book` as root element, should `sect1` be
>   treated as a chapter?"
> 
>   The Docutils structure goes document-section-section-...; sections
>   are recursively nested and don't have numbered levels.  But other
>   than that technicality, I would say that yes, a first-level section
>   inside a "book" should be a "chapter".

Okay, I replaced numbered "sect" elements with "section".

> - I think you're using `interpreted text` where you should be using
>   ``inline literals`` or just "quotes" in docstrings.

Oops ;-).

> - "`contact`: {doctype}info/author/email"
>   # QUESTION: should contact be handled differently?
> 
>   ``visit_contact`` is OK, as long as there is also an author's name,
>   and the name is already in the "{doctype}info/author".  The order is
>   significant to DocBook.
> 
> - "topic": You're treating this as a bibliographic element; it isn't.
>   It's equivalent to a DocBook "sidebar" element.  It shouldn't be in
>   "{doctype}info".  See docutils.dtd for more.

Okay, I used "sidebar" for this.

> - "Note:: `author` and `authors` are going to be a nuisance because
>   DocBook expects the name to be broken up into `honorific`,
>   `firstname`, and `surname`."
> 
>   And DocBook doesn't give us a larger-grained alternative :-(.  Maybe
>   use "othername"?  Don't try to split up a name into its components
>   -- can't work.

Okay, I still have to cleanup the document attribute handling, but I'll
do that.

> - "If it makes sense to try to use titles which appear directly before
>   tables and whatnot as title for the table, and other similar
>   manipulations, would it be easier to contruct the output using
>   mindom? That way the element manipulation could be done in a
>   separate pass."
> 
>   As I stated earlier, it doesn't make sense to overload titles that
>   way.  As for manipulating the data, it would be best to walk a
>   subtree while building a copy, rather than manipulating the data
>   directly.  The code supports this: create a visitor, call ``walk``
>   or ``walkabout`` on the head of the subtree you want to walk.
> 
> - "TODO: ensure that only `article`, `book`, and `chapter` are
>   accepted"
> 
>   I'm going to add a "choice" option type to Optik, so that this can
>   be verified automatically.  I'm not sure yet if it will go into
>   Optik itself or into a Docutils-specific subclass.

Okay, then I'll just leave this be for now.

> - "QUESTION: should encoding be variable?"
> 
>   Yes, it should.  But how?  See my response to Ueli Schlaepfer's post
>   on Doc-SIG.  (And note my comment further down, "# @@@ %
>   output_encoding".)
> 
> - "QUESTION: does 'EN' in doctype need to be swapped out per lang?"
> 
>   No, I don't think so.  It's the language of the DTD, not the
>   document.
.
> - "# TODO: author"
> 
>   I'd rather see ``def visit_author(...): pass`` than a lambda.  Or is
>   that just a reminder that it isn't done yet?

The lambdas were just so I could process the document I was using to
test.
Also that way I wouldn't mistake them for handlers with ``pass`` blocks.

> - "# TODO: authors" etc.
> 
>   Even if they do nothing, it would be best to handle all elements.
>   Run tools/text.txt through your front-end to test the writer.
> 
> - "def visit_bullet_list(self, node):
>   # QUESTION: what to do about this?"
> 
>   That's an HTMLism, to keep the table of contents list compact (no
>   whitespace).  Not relevant to DocBook.  In fact, I'm not sure what
>   the "contents" directive should do with DocBook output.  Presumably,
>   the DocBook reader can construct its *own* table of contents.

Okay, that's what I thought -- but better to make sure.

>   "def visit_paragraph(self, node):
>   # QUESTION: what is self.topic_class == 'contents'?"
> 
>   That too; table of contents HTMLism.

Okay.

> - "def visit_classifier(self, node):
>   # QUESTION: should this be handled differently?"
> 
>   Perhaps some kind of markup would be appropriate, like "type".

Done.

> - "# QUESTION: what is this for?
>   def visit_decoration(self, node):"
> 
>   See "Updates to Docutils", Docutils-develop list, 2002-05-31.

So based on how I read it, nothing special needs to be done for this?

> - "# QUESTION: is docinfo_item neeed or just a convenience function in
>   ``html4css1.py``?"
> 
>   It's just a convenience function, reducing duplication of code.  May
>   not be applicable to DocBook.

Okay.

> - "# QUESTION: is this the best mapping?
>   def visit_doctest_block(self, node):"
> 
>   Looks fine to me.  The semantics are about as good as we can expect.
>   I guess it depends if we ever implement an "example" in Docutils.
> 
> - "def visit_entry(self, node):"
> 
>   Since your tagname is invariant, you don't need the extra "context"
>   complexity.
> 
>   "# QUESTION: how to handle empty cells?"
> 
>   The HTML browsers I know of require a &nbsp; in empty cells for the
>   table to look right.  I don't know about DocBook; it probably
>   depends on what software it ends up in.  I'd hope that any software
>   sophisticated enough to process DocBook wouldn't require that kluge,
>   but your guess is as good as mine.
> 
> - "def visit_footnote(self, node):
>   # FIXME: footnote links don't work, because footnote is not in same
>   section?"
> 
>   Is that a requirement?
> 
>   Looking at DocBook footnotes, they're expected to be defined inline
>   (inside the paragraph text), and a mark is left behind.  Perhaps
>   Docutils footnotes have to be moved to replace the first
>   corresponding footnote_reference?  Tricky.
>
>   I never liked that aspect of DocBook: body-level elements inside
>   paragraphs.  Yuck.

I think I'll have to re-read your comments above and read a little more 
about DocBook's footnotes, as I don't totally understand the issues
here.

> - visit_image: According to "DocBook: The Definitive Guide", "In
>   DocBook V5.0, Graphic will be discarded. At that time, graphics will
>   have to be incorporated using MediaObject or InlineMediaObject."

Okay, switched from "graphic" to "imagedata".

> - "# QUESTION: is this the best mapping?
>   def visit_interpreted(self, node):"
> 
>   "interpreted" will become quite complex.  It may have a variety of
>   mappings.  "constant" is fine for now, but mark it incomplete.  I've
>   added this to html4css1.py::
> 
>         # @@@ Incomplete, pending a proper implementation on the
>         # Parser/Reader end.
> 
>   The main use of "interpreted" will be as an implicit hyperlink, with
>   endpoint determined by its context.  But this will be resolved
>   before the document ever gets to the writer.  I expect that
>   eventually, "interpreted" will disappear (replaced by specific
>   instances), so any arbitrary mapping is fine for now.
> 
> - visit_label: Perhaps this should go into a footnote's "label"
>   attribute?

I did this, but I got a UnicodeError with the second auto-sequenced 
item -- the one that follows "*", so I commented it out for now.

> - "# QUESTION: where does legend come from?"
> 
>   See http://docutils.sf.net/spec/rst/directives.html#figure.

Thanks, I'd missed it before.

> - "# QUESTION: does this need to be unescaped?
>   # QUESTION: is informalexample the best mapping?
>   def visit_literal_block(self, node):"
> 
>   I would think that a simple "programlisting" would be best, no
>   "informalexample" needed.  As for escaping: either use a CDATA
>   section, or escape [<>&] to entities.

I used a CDATA section.

> - "def visit_meta(self, node):
>   # QUESTION: should there be a notice that this element
>   #   is a no-op for DocBook?"
>
>   I don't think you even *need* handlers for "meta"; they'll be
>   removed before they get here.  See
>   docutils/transforms/components.py, class Filter.

Okay, removed.

> - "def visit_reference(self, node):
>   # QUESTION: should different elements be used
>   #   for different types?"
>   
>   Yes, it looks like "ulink" is appropriate for "refuri", and "link"
>   is appropriate for "refid" or "refname".

Done.

> - visit_section: Since DocBook has recursive "section" elements, they
>   could be used to avoid the 5-level limit.  But it depends if the
>   client software can handle recursive sections.

As mentioned above, I changed from sect[1-5] to nested "section"
elements, 
but will revert this if it looks like it will be a problem -- I haven't
really tested to see whether there are output differences.

> - "# QUESTION: could this be mapped to something else, since we
>   already have emphasis?
>   def visit_strong(self, node):"
> 
>   DocBook seems to have just a single generic emphasis.  Perhaps
>   '<emphasis role="strong">'?  Using "role" sucks, but it may be
>   inevitable.  The semantics here are weak anyhow, so it doesn't
>   matter.

I added role='strong'

> - "# QUESTION: does anything need to be done for this?
>    def visit_substitution_definition(self, node):"
> 
>   No.  In fact, it should be ``raise nodes.SkipNode``, with no
>   ``depart_substitution_definition``.  Changed in html4css1.py.

Done.

> - "# QUESTION: does anything need to be done for this?
>    def visit_substitution_reference(self, node):"
> 
>   No.  It's an error for a "substitution_reference" to remain in the
>   document; ``unimplemented_visit`` with catch it.  You don't need a
>   depart method.

Done.

> - "def visit_thead(self, node):
>        # QUESTION: what is this for?
>        #self.body.append(self.context.pop())
>        # QUESTION: what is this for?
>        self.context.append('')"
> 
>   It's not relevant to DocBook.  It's just a trick to get the best
>   table behavior I could out of HTML.  See ``visit_tgroup`` and
>   ``visit_tbody`` (HTML).

Okay, I cleaned up.

> Now a question for you.  What editor is the stuff on the last line 
> for?

The editor is jEdit. I'll try to keep the evanglizing to a minimum, but
I can't completely hold my tounge.

You can script it with Jython (& BeanShell), it has a plugin for
validating XML (and XML auto-completion), 
code-folding and many more nice features.

You can find it here: http://www.jedit.org/

Here's a screen-shot of me working on the docbook writer, which will
also explain all the "foo: ..." comments.

http://newtonsplace.net/jedit/jedit_docbook.png

Also, if you're curious, you can find some sample macros for scripting
jEdit with Jython here:

http://newtonsplace.net/jedit/

Here's a link to the latest version of the writer:

http://newtonsplace.net/docbook.py 

It's more complete at this point, but is still very rough and buggy.

Again, thanks for all your help.

-Ollie