[G. Milde - 2008-05-15 05:42]
> On 14.05.08, David Goodger wrote:
>> I don't think that a Writer should handle unrecognized formats.
>> Recognizing formats and dealing with input is the job of a Parser or
>> Reader. A Writer's job is to convert a standard doctree to the output
>> Yes, I realize that this is dealing with a distinct data format, so
>> the model may break down. But this is something to keep in mind.
> We should care to differentiate between *input*, *internal*, and *output*
> The basic question for inclusion of a math-directive is "How should the
> standard doctree store the content of a math node?" I. e. which
> *internal* format should docutils use.
That depends on the stage of processing. The reST parser cannot deal with the
math input, because it's not reST. The math directive can't deal with the math
input, because it has no idea what the output format should be. The Writer
should not deal with the math input, because that's not its job and it would
cause a duplication of effort. A Transform is the right place to deal with the
math input conversion, because it has all the necessary information.
The math input data should be stored in a "pending" node in the original input
format. The Transform will convert that to something compatible with the target
Writer. The internal format for math in the Docutils doctree is the input
format (prior to the Transform running), and the output format (after the
I have no interest in making a generic math-doctree; it would double the size of
the doctree spec. As far as Docutils is concerned, math is a blob. And there
is no reason to do a double conversion; it's imperfect. Let's just use a
standard format internally: the input format, LaTeX.
If you're still not convinced, let me hammer it home. There's one edge case
that seals the deal: publishing to and from a doctree. A document can be
processed to a pickled doctree, stored in a database, then later the doctree can
be retrieved and processed into a concrete target format. In this case, the
Transform should leave the "pending" node alone during the first run, since
there *is* no target format yet.
> The set of supported input and output formats can be extended later
> without change to the doctree specification.
> IMO, the main candidates for the *internal* data format are:
> best graphical representation, relatively simply to type in directly,
> widely supported and established in the scientific community.
> modern data-exchange format, standardised, "the future",
> hard to type in by hand.
> Unfortunately, conversion between them is not always loss-less, so it is
> desirable to keep input data that is in one of them in this format if
> the *output* data requires the same.
No, it's desirable to keep the doctree data in the input format (whatever that
is) until it's converted into the target format.
> IMO, the Transform should convert the *input* format into the *internal*
> format (the one required by the current writer or both), normalising the
No, this adds too much complication to the Writers. Math markup is very
specialized, and should be processed in one place only. That one place could be
a whole module, or even a package, but it should not be distributed over
multiple Writers. This is NOT the job of a Writer!
> Jens' latex-math provides the code for a LaTeX->MathML Transform,
> searching for a suitable MathML-LaTeX converter is the next important
No double conversion, please.
>> By the time the Writer sees it, the math should be just a blob to
>> insert into the output stream.
> In the most basic cases, yes. But generally a writer will convert the
> *internal* format of the standard doctree to the *output* format.
The internal format for math is the input format.
> * the html+mathml writer just inserts the Math ML,
There is no html+mathml Writer. There's an HTML Writer, that's all.
> * the latex writer inserts LaTeX code.
> However, some html writer variants (or options) would care for older
> browsers not understanding Math-ML
> * "html+pngmath" would produce graphical representations of the
> formulae from the LaTeX data (a la latex2html),
> * "html+htmlmath" would convert the Math-ML to a HTML+CSS
> * "html-jsmath" would write HTML+java-script for the jsmath extension...
> Other writers are feasible as well, e.g.
> * a "unicode" writer could convert the math node content to a textual
> representation using the Unicode chars for math symbols.
> (Unicode defines "all possible" mathematical symbols, using a
> fixed-width font, even large symbols (spanning multiple lines) can be
I want the Transform to take care of these cases, not the Writers. The option
will affect the Transform. The logical place for the variation is in the
Transform (singular), NOT in the Writers (plural).
Math output formats do not correspond one-to-one with document output formats.
It's an M-to-N relationship.
Again, this specialized processing is NOT the responsibility of Writers!
David Goodger <http://python.net/~goodger>