Document titles (was RE: [Docstring-develop] DPS - possible bugs/features)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ueli Schläpfer gave a good explanation of why David Goodger had chosen
to do something I didn't understand - namely make a document with a
title at the start have no section within it (except it was subtler than
that).

By the way, I did check the documentation, and it seemed to me that the
current documentation indicated that a title would cause a section to be
started - so David, if you want to perform this promotion, then it needs
to be documented (unless I've missed it *again*).

On the other hand, I don't actually see why the DPS system should have
to do the promotion for the user, when it's not clear that it is always
wanted (in other words, why is it up to the DPS system to decide that
the case of a single section is special, and then have a sudden
disjunction in its behaviour for two sections - that's my inner pedant
objecting!).

Regardless, an immediate solution to resolve the docstring case (and
possibly a useful thing to do anyway) would be to have an argument to
the Parser that states upfront that we are working on a document
*fragment* - that is, something that is going to be "stitched in" by
hand, later on, to an existing DPS tree. I would imagine that we may
come across other actions in the future that are sensible in the context
of a full document, but not in the cause of a fragment.

As to HTML. The normal convention in HTML is that the document title
(that is, the thing in <title> ) be the same as the <h1> title, and that
there be one (and only one) <h1> in a document. The <title> is then
displayed on the window as decoration (e.g., as the text on the top of
the browser window). This is a strong convention, but is (of course) not
part of the standard.

A relevant cutting from HTML4 is:

    Every HTML document must have a TITLE element in the HEAD
    section.

    Authors should use the TITLE element to identify the contents
    of a document. Since users often consult documents out of
    context, authors should provide context-rich titles. Thus,
    instead of a title such as "Introduction", which doesn't
    provide much contextual background, authors should supply
    a title such as "Introduction to Medieval Bee-Keeping" instead.

    For reasons of accessibility, user agents must always make the
    content of the TITLE element available to users (including
    TITLE elements that occur in frames). The mechanism for doing
    so depends on the user agent (e.g., as a caption, spoken).

Broadly, HTML common practise treats a document as having a single title
at the top, which is used for both <title> and <h1>, and the "section
hierarchy" (if any) starts with an <h2>.

That makes HTML easy - it doesn't, of course, address any of Ueli's
other points.

Maybe (horrors) we should reserve one specific markup form to mean
"overall title"::

    ==============
    Document title
    ==============

    ===================
    This is not allowed
    ===================
    (because it is a form reserved for the document
    title, of which we may only have one).

Or perhaps we'll have to resort to::

    :Title: Document title

Somehow, I don't see David liking either of those...

Anyway, to some specific comments:

Ueli wrote:
> This makes sense to me, so I consider it a feature.  I'm actually not
> sure how I'd be able to give a title to the document as a whole if the
> parser worked as you expected (unless you special-cased the first
> section level!)  (Explicitly discriminating between a document title
> and regular section titles doesn't count here.)

Hmm. Having concentrated on the HTML case (sorry, it's what I've been
working on) I hadn't seen the distinction, of course.

My problem is that I'm trying to write formatters for *any* document
that might come in (yes, I know I'm writing pydps/pysource, but I want
the Writer to work for any document), so we have to be able to cope
with:

1. Document with no titles at all
2. Document with one title (OK - David does that)
3. Document with more than one title (at the same level)
   - which in essence *really* resolves back to case 1.

I'm afraid that the only "perfect" solution I can see for that (in the
sense of *predictable*) is to require the user to indicate that they
*do* have a document title, and that it is *this* thing, here. That then
makes them aware of the problem, also, which I think is a necessary
thing (otherwise, surprise will eventuate).

Ueli wrote:
> Now, it seems to me that the structures of documents and sections are
> close relatives:
>
> - A document may or may not have a title, a section always has one.
> - A document may have a subtitle, bibliographic elements, and an
>   abstract.  A section has none of these.
> - The rest of the content follows the same model.
>
> Can thus sections be treated as simpler cases of documents (instead of
> the other way round, which is how I understand your post)?  I'm not
> sure how I would exploit this, though...

I'm not sure either (nor if one should), but your analysis is clearly
correct. Again, I've been too focused on the HTML case.

David Goodger wrote:
> It is actually intended that by the time the document tree
> gets to the writer, it must have a title. The parser can't
> always determine the title by itself, such as in PySource
> mode. The PySource reader is expected to supply all the
> titles as appropriate.

Hmm. In PySource mode, the parser should not be trying to introduce
titles - it is, after all, handling arbitrary document fragments, and
can't know anything about their global scope (unless it is told!).

*If* the final tree shall always have a title, where does it come from
if the document author didn't provide one? Surely in that case it is not
up to the *parser* to decide on what a title should be - that is up to
the application. So one has three options:

1. The parser makes one up (yuck)
2. The application makes one up (yuck)
3. An error is generated (yuck)

I'd vote for 3, not least for ease of explanation (I cite "there are no
complex rules about making up document titles" versus "well, if the
document is one section, we'll use the section title as a title, and not
have any sections, but if the document is zero sections we'll do XXX and
if the document is two or more sections we'll do YYY" - sorry to harp on
the point!).

Ueli seems, to me, to be partially arguing for the case I want in his
next message. He also writes:

> The source filename isn't known to the writer, is it?

Well, it is in the pysource/pydps case (and I don't see why it shouldn't
be elsewhere) - I attach a "filename" attribute to the Package and
Module elements (which later on will become appropriate DPS structures,
of course).

> Still, say I want ``<title>filename.rtxt</title>`` in HTML, but I
> definitely don't want ``\title{filename}`` in LaTeX.  How about
> giving the title a "generated" attribute?  Then it's left to the
> writer to use (or ignore) it, but any document could be required to
> have a title.  (Which would mean to update the DTD.)

*If* David still really wants to produce a title of his own, then yes,
that's a good distinction to make.

> (BTW, my first idea was to add a "sourcefile=filename.rtxt"
> attribute to the document.  I like the "generated" much better,
> though!)

I think that the sourcefile as an optional attribute on the document is
probably a useful thing, as well.

Anyway, this is difficult stuff (in terms of folding it in to something
easy to use and remember, not in terms of implementing!), so I await
David's responses with interest.

Tibs

--
Tony J Ibbs (Tibs)      http://www.tibsnjoan.co.uk/
Give a pedant an inch and they'll take 25.4mm
(once they've established you're talking a post-1959 inch, of course)
My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)