From: Felix W. <Fel...@gm...> - 2004-10-16 20:30:04
|
David Goodger wrote: > Docutils doesn't do text transformations like that (changing "..." to > a single ellipsis character). For the reasons why, please see > <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>. It reads: | Docutils has no need of a character entity subsystem. Supporting | Unicode and text encodings, character entities should be directly | represented in the text: a copyright symbol should be represented by | the copyright symbol character. For the copyright sign, it's indeed a good idea to enter it directly. However, for some characters, the direct unicode-representation looks unnatural in plaintext. For example, normal dash, en-dash and em-dash are hardly distinguishable in a monospaced font. And a 'true' ellipsis would be rendered much too narrow in monospace. And even if it's possible to enter such characters, it is not intuitive. reStructuredText is often required to be edited by persons not familiar with the markup language. Such persons normally do not enter non-ASCII characters if there are existing ASCII characters (e.g., they would write "--" instead of an en-dash) and if they were told to enter unicode-symbols, they would find it extremely inconvenient (I do, too). In fact, it's very natural to write en-dashes as two normal dashes -- like this. Or maybe also em-dashes---like this. And you'd always write ellipses like this... Or like this ... The LaTeX writer already does the en-/em-dash transformation (because LaTeX automatically transforms '--' into a real en-dash and the LaTeX writer doesn't escape dashes), and I have been using them and found them quite convenient. However, sometimes this behavior is undesired, e.g. when typing options, like --stylesheet (without surrounding ``literal quotes``). An intelligent replacement mechanism in the reStructuredText parser would fix this problem, because it could transform "foo -- bar", but not "foo--bar" nor " --bar". And "foo---bar", but not "foo --- bar" (I think), and also not "-----" (sometimes people might use such repeated dashes e.g. to render arrows). Such an intelligent mechanism would greatly simplify inputting reStructuredText. For ellipses, I'm not entirely sure what to do. For HTML, the … ellipsis is often too narrow, but for LaTeX, it would be good to have "foo..." and "foo ..." both transformed to "foo\,\dots" ("foo", narrow-space, ellipsis). As this (narrow-space + ellipsis) does not lead to very good results for HTML, writer-dependent handling would probably be necessary if this were to be implemented in the reST parser. Thus I think it would be a better idea to implement ellipsis-support in the LaTeX writer, where it's actually necessary. If I find the time, I can post a patch. So I propose the following: * Add intelligent en-dash and em-dash transformation to the reST parser. * Add intelligent ellipsis transformation to the LaTeX writer. | If this is not possible in an authoring environment, a pre-processing | stage can be added, Not really. A pre- or post-processor cannot distinguish between literal (= monospaced) and normal text, just to name one problem. | or a table of substitution definitions can be devised. Substitutions are not very readable and need to be learned by human document writers. And after all, I don't see any disadvantages in adding an automatic transformation. -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |