[Docutils-users] Re: rendering ellipsis_

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

David Goodger wrote:

> Docutils doesn't do text transformations like that (changing "..." to
> a single ellipsis character).  For the reasons why, please see
> <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>.

It reads:

| Docutils has no need of a character entity subsystem. Supporting
| Unicode and text encodings, character entities should be directly
| represented in the text: a copyright symbol should be represented by
| the copyright symbol character.

For the copyright sign, it's indeed a good idea to enter it directly.
However, for some characters, the direct unicode-representation looks
unnatural in plaintext.

For example, normal dash, en-dash and em-dash are hardly distinguishable
in a monospaced font.  And a 'true' ellipsis would be rendered much too
narrow in monospace.

And even if it's possible to enter such characters, it is not intuitive.
reStructuredText is often required to be edited by persons not familiar
with the markup language.  Such persons normally do not enter non-ASCII
characters if there are existing ASCII characters (e.g., they would
write "--" instead of an en-dash) and if they were told to enter
unicode-symbols, they would find it extremely inconvenient (I do, too).

In fact, it's very natural to write en-dashes as two normal dashes --
like this.  Or maybe also em-dashes---like this.  And you'd always write
ellipses like this...  Or like this ...

The LaTeX writer already does the en-/em-dash transformation (because
LaTeX automatically transforms '--' into a real en-dash and the LaTeX
writer doesn't escape dashes), and I have been using them and found them
quite convenient.  However, sometimes this behavior is undesired,
e.g. when typing options, like --stylesheet (without surrounding
``literal quotes``).

An intelligent replacement mechanism in the reStructuredText parser
would fix this problem, because it could transform "foo -- bar", but not
"foo--bar" nor " --bar".  And "foo---bar", but not "foo --- bar" (I
think), and also not "-----" (sometimes people might use such repeated
dashes e.g. to render arrows).

Such an intelligent mechanism would greatly simplify inputting
reStructuredText.

For ellipses, I'm not entirely sure what to do.  For HTML, the &hellip;
ellipsis is often too narrow, but for LaTeX, it would be good to have
"foo..."  and "foo ..." both transformed to "foo\,\dots" ("foo",
narrow-space, ellipsis).  As this (narrow-space + ellipsis) does not
lead to very good results for HTML, writer-dependent handling would
probably be necessary if this were to be implemented in the reST parser.
Thus I think it would be a better idea to implement ellipsis-support in
the LaTeX writer, where it's actually necessary.  If I find the time, I
can post a patch.

So I propose the following:

* Add intelligent en-dash and em-dash transformation to the reST parser.
* Add intelligent ellipsis transformation to the LaTeX writer.

| If this is not possible in an authoring environment, a pre-processing
| stage can be added,

Not really.  A pre- or post-processor cannot distinguish between literal
(= monospaced) and normal text, just to name one problem.

| or a table of substitution definitions can be devised.

Substitutions are not very readable and need to be learned by human
document writers.  And after all, I don't see any disadvantages in
adding an automatic transformation.

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/