From: David Goodger <goodger@py...>  20080515 13:04:10

[G. Milde  20080515 05:42] > On 14.05.08, David Goodger wrote: >> I don't think that a Writer should handle unrecognized formats. >> Recognizing formats and dealing with input is the job of a Parser or >> Reader. A Writer's job is to convert a standard doctree to the output >> format. > >> Yes, I realize that this is dealing with a distinct data format, so >> the model may break down. But this is something to keep in mind. > > We should care to differentiate between *input*, *internal*, and *output* > format. > > The basic question for inclusion of a mathdirective is "How should the > standard doctree store the content of a math node?" I. e. which > *internal* format should docutils use. That depends on the stage of processing. The reST parser cannot deal with the math input, because it's not reST. The math directive can't deal with the math input, because it has no idea what the output format should be. The Writer should not deal with the math input, because that's not its job and it would cause a duplication of effort. A Transform is the right place to deal with the math input conversion, because it has all the necessary information. The math input data should be stored in a "pending" node in the original input format. The Transform will convert that to something compatible with the target Writer. The internal format for math in the Docutils doctree is the input format (prior to the Transform running), and the output format (after the Transform runs). I have no interest in making a generic mathdoctree; it would double the size of the doctree spec. As far as Docutils is concerned, math is a blob. And there is no reason to do a double conversion; it's imperfect. Let's just use a standard format internally: the input format, LaTeX. If you're still not convinced, let me hammer it home. There's one edge case that seals the deal: publishing to and from a doctree. A document can be processed to a pickled doctree, stored in a database, then later the doctree can be retrieved and processed into a concrete target format. In this case, the Transform should leave the "pending" node alone during the first run, since there *is* no target format yet. > The set of supported input and output formats can be extended later > without change to the doctree specification. Yes. > IMO, the main candidates for the *internal* data format are: > > LaTeX > best graphical representation, relatively simply to type in directly, > widely supported and established in the scientific community. > > MathML > modern dataexchange format, standardised, "the future", > hard to type in by hand. > > Unfortunately, conversion between them is not always lossless, so it is > desirable to keep input data that is in one of them in this format if > the *output* data requires the same. No, it's desirable to keep the doctree data in the input format (whatever that is) until it's converted into the target format. > IMO, the Transform should convert the *input* format into the *internal* > format (the one required by the current writer or both), normalising the > doctree. No, this adds too much complication to the Writers. Math markup is very specialized, and should be processed in one place only. That one place could be a whole module, or even a package, but it should not be distributed over multiple Writers. This is NOT the job of a Writer! > Jens' latexmath provides the code for a LaTeX>MathML Transform, > searching for a suitable MathMLLaTeX converter is the next important > step. No double conversion, please. >> By the time the Writer sees it, the math should be just a blob to >> insert into the output stream. > > In the most basic cases, yes. But generally a writer will convert the > *internal* format of the standard doctree to the *output* format. The internal format for math is the input format. > * the html+mathml writer just inserts the Math ML, There is no html+mathml Writer. There's an HTML Writer, that's all. > * the latex writer inserts LaTeX code. > > However, some html writer variants (or options) would care for older > browsers not understanding MathML > > * "html+pngmath" would produce graphical representations of the > formulae from the LaTeX data (a la latex2html), > > * "html+htmlmath" would convert the MathML to a HTML+CSS > substitution, > > * "htmljsmath" would write HTML+javascript for the jsmath extension... > > Other writers are feasible as well, e.g. > > * a "unicode" writer could convert the math node content to a textual > representation using the Unicode chars for math symbols. > > (Unicode defines "all possible" mathematical symbols, using a > fixedwidth font, even large symbols (spanning multiple lines) can be > constructed.) I want the Transform to take care of these cases, not the Writers. The option will affect the Transform. The logical place for the variation is in the Transform (singular), NOT in the Writers (plural). Math output formats do not correspond onetoone with document output formats. It's an MtoN relationship. Again, this specialized processing is NOT the responsibility of Writers!  David Goodger <http://python.net/~goodger>; 