[Docutils-users] Re: rendering ellipsis_

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

David Goodger wrote:

> Felix Wiemann wrote:
>
>> And even if it's possible to enter such characters, it is not
>> intuitive.
>>
>> [...] and if they were told to enter unicode-symbols, they would find
>> it extremely inconvenient (I do, too).
>
> I think that's a failing of the operating systems people use.

Possibly, even though I personally rarely have the need to enter
non-ASCII characters, especially ones outside latin1.  My system is
still latin1-based (mostly because of my laziness), so things are
becoming quite messy if I start entering texts in UTF-8 (besides the
fact that it's difficult for me to enter arbitrary Unicode characters).
Sure it's my fault, but the need for UTF-8 (and similar cool
internationalized things) isn't urgent enough to make me solve the
problem.  I presume I'm not the only one.

> The world is moving toward internationalization, with Unicode and
> UTF-8 at the forefront.

Which doesn't mean that it gets easier to enter seldom-needed characters
like en-dashes in plain text.

> I hope it gets here soon so we can ignore the issue completely.

Not too soon, if at all.

Practicability counts, IMO.  Docutils should be useful *now*, not in
three years.

>> In fact, it's very natural to write en-dashes as two normal dashes
>> -- like this.  Or maybe also em-dashes---like this.  And you'd
>> always write ellipses like this...  Or like this ...
>
> You illustrate a problem: there is no one standard for such
> transformations.  And there's no way to distinguish between
> "transformation desired" and "leave as-is" in normal text.

I never saw to two dashes surrounded by whitespace in any typeset text.
I.e., I don't think something like that exists in reality.  Same for
three dashes surrounded by non-whitespace.

And if someone needs it, he can escape one of the dashes: "foo \-- bar",
or "foo -\- bar", or "foo-\--bar", or "foo\---bar".

>> The LaTeX writer already does the en-/em-dash transformation
>> (because LaTeX automatically transforms '--' into a real en-dash and
>> the LaTeX writer doesn't escape dashes), and I have been using them
>> and found them quite convenient.
>
> I question whether the LaTeX writer *should* be doing this.

It shouldn't.  We'd get more flexibility if the reST parser did it
instead of the LaTeX writer, because it's currently impossible to escape
anything for the LaTeX writer, but for the reST parser it is possible.
I.e., "foo -\- bar" currently does not escape the dashes, but if the
reST parser did the transformation, it would work.

Furthermore, documents should not be written for one particular writer,
but if the dash-feature is only supported by the LaTeX writer, this is
exactly what happens.  (I have such documents which rely on "--" being
transformed to en-dash.)

> At the least it should be an option, disabled by default.

I don't think that's necessary, as it's possible to escape dashes.

>> However, sometimes this behavior is undesired, e.g. when typing
>> options, like --stylesheet (without surrounding ``literal quotes``).
>
> Exactly.

Simple cases like --stylesheet would be caught by a little bit of
intelligence (read: a proper regex), because there is no whitespace
after the two dashes.

>> So I propose the following:
>>
>> * Add intelligent en-dash and em-dash transformation to the reST
>>   parser.
>> * Add intelligent ellipsis transformation to the LaTeX writer.
>
> You may be opening up a big can of worms.  Once the underlying system
> is there, won't there be a bunch of requests for (potentially
> conflicting) additions?  When will it stop?

The fact that there is one such a transformation doesn't mean we will be
adding anything, because there are IMO compelling reasons for
dash-transformation (which are: monospace rendering, enterability and
intuitiveness, usualness in plain texts, and rather high
unambiguousness), which don't exist for the other transformations
proposed in alternatives.txt.

Concerning the ellipsis, I'm not entirely sure if it's a good idea to
implement it, because it is not *that* important, sometimes the
transformation may be undesirable for some languages or
style-conventions, and it's not possible to escape special-cases because
the logic would be implemented in the LaTeX writer.  So we probably
rather shouldn't do the ellipsis transformation as long as these
problems aren't solved.

>> I don't see any disadvantages in adding an automatic transformation.
>
> I do, because it won't do what I want 100% of the time.  It has to be
> optional.

With sufficient intelligence, it will do in 99,9%.  (I can't remember
any case where the effect would have been undesired.)  Is escapability
sufficient optionalness for you?

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/