Thread: [Docutils-users] rendering ellipsis_ | Docutils: Documentation Utilities

docutils-users

[Docutils-users] rendering ellipsis_

From: Kjetil T. H. <kje...@if...> - 2004-10-06 20:19:50

in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML
wants ``&hellip;``.  it would be nice if Docutils did this
transformation.

.. _ellipsis: http://en.wikipedia.org/wiki/Ellipsis [#]_
.. [#] has anyone looked at fitting reST-parsing into Gnus, yet? :-)
-- 
Kjetil T.

Re: [Docutils-users] rendering ellipsis_

From: David G. <go...@py...> - 2004-10-07 00:53:38

Attachments: signature.asc

[Kjetil Torgrim Homme]
> in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML
> wants ``&hellip;``.  it would be nice if Docutils did this
> transformation.

Docutils doesn't do text
transformations like that (changing "..." to a single ellipsis
character).  For the reasons why, please see
<http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>.

Question 2.7 of the FAQ (http://docutils.sourceforge.net/FAQ.html)
directly addresses this issue.  Short summary: enter the real ellipsis
character, using UTF-8 or another encoding, or a workaround.

The LaTeX writer already translates U+2026 to "\dots".  Is that
equivalent to "\ldots"?  (I'm not a TeX expert.)

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-users] rendering ellipsis_

From: Kjetil T. H. <kje...@if...> - 2004-10-07 02:21:33

On ons, 2004-10-06 at 20:53 -0400, David Goodger wrote:
> [Kjetil Torgrim Homme]
> > in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML
> > wants ``&hellip;``.  it would be nice if Docutils did this
> > transformation.
> 
> Docutils doesn't do text
> transformations like that (changing "..." to a single ellipsis
> character).  For the reasons why, please see
> <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>.

ah.  a text-replace directive looks like a very nice solution for this.
I could then make a file with the transformations I like and include it
in my files.

I can't simply add a simplistic preprocessor (such as a sed script),
since it doesn't know about reST syntax.  it would mangle literals and
usage such as

  Subsubsection
  .............

> The LaTeX writer already translates U+2026 to "\dots".  Is that
> equivalent to "\ldots"?  (I'm not a TeX expert.)

sorry, I got it wrong.  it's called \ldots in a math environment, so
\dots and $\ldots$ are equivalent.

-- 
Kjetil T.

[Docutils-users] Re: rendering ellipsis_

From: Felix W. <Fel...@gm...> - 2004-10-16 20:30:04

David Goodger wrote:

> Docutils doesn't do text transformations like that (changing "..." to
> a single ellipsis character).  For the reasons why, please see
> <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>.

It reads:

| Docutils has no need of a character entity subsystem. Supporting
| Unicode and text encodings, character entities should be directly
| represented in the text: a copyright symbol should be represented by
| the copyright symbol character.

For the copyright sign, it's indeed a good idea to enter it directly.
However, for some characters, the direct unicode-representation looks
unnatural in plaintext.

For example, normal dash, en-dash and em-dash are hardly distinguishable
in a monospaced font.  And a 'true' ellipsis would be rendered much too
narrow in monospace.

And even if it's possible to enter such characters, it is not intuitive.
reStructuredText is often required to be edited by persons not familiar
with the markup language.  Such persons normally do not enter non-ASCII
characters if there are existing ASCII characters (e.g., they would
write "--" instead of an en-dash) and if they were told to enter
unicode-symbols, they would find it extremely inconvenient (I do, too).

In fact, it's very natural to write en-dashes as two normal dashes --
like this.  Or maybe also em-dashes---like this.  And you'd always write
ellipses like this...  Or like this ...

The LaTeX writer already does the en-/em-dash transformation (because
LaTeX automatically transforms '--' into a real en-dash and the LaTeX
writer doesn't escape dashes), and I have been using them and found them
quite convenient.  However, sometimes this behavior is undesired,
e.g. when typing options, like --stylesheet (without surrounding
``literal quotes``).

An intelligent replacement mechanism in the reStructuredText parser
would fix this problem, because it could transform "foo -- bar", but not
"foo--bar" nor " --bar".  And "foo---bar", but not "foo --- bar" (I
think), and also not "-----" (sometimes people might use such repeated
dashes e.g. to render arrows).

Such an intelligent mechanism would greatly simplify inputting
reStructuredText.

For ellipses, I'm not entirely sure what to do.  For HTML, the &hellip;
ellipsis is often too narrow, but for LaTeX, it would be good to have
"foo..."  and "foo ..." both transformed to "foo\,\dots" ("foo",
narrow-space, ellipsis).  As this (narrow-space + ellipsis) does not
lead to very good results for HTML, writer-dependent handling would
probably be necessary if this were to be implemented in the reST parser.
Thus I think it would be a better idea to implement ellipsis-support in
the LaTeX writer, where it's actually necessary.  If I find the time, I
can post a patch.

So I propose the following:

* Add intelligent en-dash and em-dash transformation to the reST parser.
* Add intelligent ellipsis transformation to the LaTeX writer.

| If this is not possible in an authoring environment, a pre-processing
| stage can be added,

Not really.  A pre- or post-processor cannot distinguish between literal
(= monospaced) and normal text, just to name one problem.

| or a table of substitution definitions can be devised.

Substitutions are not very readable and need to be learned by human
document writers.  And after all, I don't see any disadvantages in
adding an automatic transformation.

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

Re: [Docutils-users] Re: rendering ellipsis_

From: David G. <go...@py...> - 2004-10-17 19:24:01

Attachments: signature.asc

[Felix Wiemann]
 > For example, normal dash, en-dash and em-dash are hardly
 > distinguishable in a monospaced font.  And a 'true' ellipsis would
 > be rendered much too narrow in monospace.

Yes, there are limitations when using/requiring monospace typefaces.

 > And even if it's possible to enter such characters, it is not
 > intuitive.

I hope it's becoming more possible and intuitive though.  Perhaps I've
been spoiled, being used to the easy non-ASCII input methods that Macs
have had since the beginning.  I'm now setting up a Debian system
which will be my main desktop; I'll find out the current situation
there.

 > and if they were told to enter unicode-symbols, they would find it
 > extremely inconvenient (I do, too).

I think that's a failing of the operating systems people use.  The
world is moving toward internationalization, with Unicode and UTF-8 at
the forefront.  I hope it gets here soon so we can ignore the issue
completely.

 > In fact, it's very natural to write en-dashes as two normal dashes
 > -- like this.  Or maybe also em-dashes---like this.  And you'd
 > always write ellipses like this...  Or like this ...

You illustrate a problem: there is no one standard for such
transformations.  And there's no way to distinguish between
"transformation desired" and "leave as-is" in normal text.

Such ambiguity is the reason why I decided to ignore the issue.  It's
a "refuse the temptation to guess" situation.

 > The LaTeX writer already does the en-/em-dash transformation
 > (because LaTeX automatically transforms '--' into a real en-dash and
 > the LaTeX writer doesn't escape dashes), and I have been using them
 > and found them quite convenient.

I question whether the LaTeX writer *should* be doing this.  At the
least it should be an option, disabled by default.

 > However, sometimes this behavior is undesired, e.g. when typing
 > options, like --stylesheet (without surrounding ``literal quotes``).

Exactly.

 > An intelligent replacement mechanism in the reStructuredText parser
 > would fix this problem, because it could transform "foo -- bar", but
 > not "foo--bar" nor " --bar".  And "foo---bar", but not "foo --- bar"
 > (I think), and also not "-----" (sometimes people might use such
 > repeated dashes e.g. to render arrows).
 >
 > Such an intelligent mechanism would greatly simplify inputting
 > reStructuredText.

Again, any such system should be optional, and disabled by default.

 > So I propose the following:
 >
 > * Add intelligent en-dash and em-dash transformation to the reST
 >   parser.
 > * Add intelligent ellipsis transformation to the LaTeX writer.

You may be opening up a big can of worms.  Once the underlying system
is there, won't there be a bunch of requests for (potentially
conflicting) additions?  When will it stop?

 > | If this is not possible in an authoring environment, a
 > | pre-processing stage can be added,
 >
 > Not really.  A pre- or post-processor cannot distinguish between
 > literal (= monospaced) and normal text, just to name one problem.

That's true.

 > I don't see any disadvantages in adding an automatic transformation.

I do, because it won't do what I want 100% of the time.  It has to be
optional.

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-users] Re: rendering ellipsis_

From: Aleksey G. <agu...@me...> - 2004-10-17 21:32:52

David Goodger writes:
> [Felix Wiemann]
>  > I don't see any disadvantages in adding an automatic transformation.
>
> I do, because it won't do what I want 100% of the time.  It has to be
> optional.

Nothing in RST does "what I want" 100% of the time. Yet it works out
nicely as long as the mismatch is more often the exception than the
rule, and there is a way to override the default behavior. IMO
intelligent '--' to em-dash / '...' to ellipsis substitution would be
no worse than, let's say, current hyperlink/text substitution rules.

-- 
Aleksey Gurtovoy
MetaCommunications Engineering

[Docutils-users] Re: rendering ellipsis_

From: Felix W. <Fel...@gm...> - 2004-10-17 21:56:43

David Goodger wrote:

> Felix Wiemann wrote:
>
>> And even if it's possible to enter such characters, it is not
>> intuitive.
>>
>> [...] and if they were told to enter unicode-symbols, they would find
>> it extremely inconvenient (I do, too).
>
> I think that's a failing of the operating systems people use.

Possibly, even though I personally rarely have the need to enter
non-ASCII characters, especially ones outside latin1.  My system is
still latin1-based (mostly because of my laziness), so things are
becoming quite messy if I start entering texts in UTF-8 (besides the
fact that it's difficult for me to enter arbitrary Unicode characters).
Sure it's my fault, but the need for UTF-8 (and similar cool
internationalized things) isn't urgent enough to make me solve the
problem.  I presume I'm not the only one.

> The world is moving toward internationalization, with Unicode and
> UTF-8 at the forefront.

Which doesn't mean that it gets easier to enter seldom-needed characters
like en-dashes in plain text.

> I hope it gets here soon so we can ignore the issue completely.

Not too soon, if at all.

Practicability counts, IMO.  Docutils should be useful *now*, not in
three years.

>> In fact, it's very natural to write en-dashes as two normal dashes
>> -- like this.  Or maybe also em-dashes---like this.  And you'd
>> always write ellipses like this...  Or like this ...
>
> You illustrate a problem: there is no one standard for such
> transformations.  And there's no way to distinguish between
> "transformation desired" and "leave as-is" in normal text.

I never saw to two dashes surrounded by whitespace in any typeset text.
I.e., I don't think something like that exists in reality.  Same for
three dashes surrounded by non-whitespace.

And if someone needs it, he can escape one of the dashes: "foo \-- bar",
or "foo -\- bar", or "foo-\--bar", or "foo\---bar".

>> The LaTeX writer already does the en-/em-dash transformation
>> (because LaTeX automatically transforms '--' into a real en-dash and
>> the LaTeX writer doesn't escape dashes), and I have been using them
>> and found them quite convenient.
>
> I question whether the LaTeX writer *should* be doing this.

It shouldn't.  We'd get more flexibility if the reST parser did it
instead of the LaTeX writer, because it's currently impossible to escape
anything for the LaTeX writer, but for the reST parser it is possible.
I.e., "foo -\- bar" currently does not escape the dashes, but if the
reST parser did the transformation, it would work.

Furthermore, documents should not be written for one particular writer,
but if the dash-feature is only supported by the LaTeX writer, this is
exactly what happens.  (I have such documents which rely on "--" being
transformed to en-dash.)

> At the least it should be an option, disabled by default.

I don't think that's necessary, as it's possible to escape dashes.

>> However, sometimes this behavior is undesired, e.g. when typing
>> options, like --stylesheet (without surrounding ``literal quotes``).
>
> Exactly.

Simple cases like --stylesheet would be caught by a little bit of
intelligence (read: a proper regex), because there is no whitespace
after the two dashes.

>> So I propose the following:
>>
>> * Add intelligent en-dash and em-dash transformation to the reST
>>   parser.
>> * Add intelligent ellipsis transformation to the LaTeX writer.
>
> You may be opening up a big can of worms.  Once the underlying system
> is there, won't there be a bunch of requests for (potentially
> conflicting) additions?  When will it stop?

The fact that there is one such a transformation doesn't mean we will be
adding anything, because there are IMO compelling reasons for
dash-transformation (which are: monospace rendering, enterability and
intuitiveness, usualness in plain texts, and rather high
unambiguousness), which don't exist for the other transformations
proposed in alternatives.txt.

Concerning the ellipsis, I'm not entirely sure if it's a good idea to
implement it, because it is not *that* important, sometimes the
transformation may be undesirable for some languages or
style-conventions, and it's not possible to escape special-cases because
the logic would be implemented in the LaTeX writer.  So we probably
rather shouldn't do the ellipsis transformation as long as these
problems aren't solved.

>> I don't see any disadvantages in adding an automatic transformation.
>
> I do, because it won't do what I want 100% of the time.  It has to be
> optional.

With sufficient intelligence, it will do in 99,9%.  (I can't remember
any case where the effect would have been undesired.)  Is escapability
sufficient optionalness for you?

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

Re: [Docutils-users] Re: rendering ellipsis_

From: David G. <go...@py...> - 2004-10-19 15:03:45

Attachments: signature.asc

[David Goodger]
 >> You may be opening up a big can of worms.  Once the underlying
 >> system is there, won't there be a bunch of requests for
 >> (potentially conflicting) additions?  When will it stop?

[Felix Wiemann]
 > The fact that there is one such a transformation doesn't mean we
 > will be adding anything, because there are IMO compelling reasons
 > for dash-transformation
...
 > which don't exist for the other transformations proposed in
 > alternatives.txt.

Compelling arguments could be put forth for any number of other
transformations.  docs/dev/rst/alternatives.txt doesn't list all
possible transformations, just a sampling.  I still think that once we
start down this path, it will be difficult to limit the uses of
character processing.  It will become a full-blown subsystem.  We must
be cautious.

 > (which are: monospace rendering, enterability and intuitiveness,
 > usualness in plain texts, and rather high unambiguousness),

Yes, those are compelling.  I'll change my vote to +0 (but read on).

 > Concerning the ellipsis, I'm not entirely sure if it's a good idea
 > to implement it, because it is not *that* important, sometimes the
 > transformation may be undesirable for some languages or
 > style-conventions,

Seems to me that *any* text transformation may be undesirable to
somebody, somewhere, sometime.

 > and it's not possible to escape special-cases because the logic
 > would be implemented in the LaTeX writer.

Why wouldn't the logic for ellipsis be in the parser?

 > Is escapability sufficient optionalness for you?

No.  That adds an extra burden for those people who *don't* want the
feature.  Better to make it a normally-disabled "power user" option
(or multiple options).  Then there's an expectation that the user will
know what they're getting into.

I say multiple options because there is no standard way to represent
the various dashes.  Some people use two hyphens for an em-dash (--),
some three (---).  According to `The Chicago Manual of Style`, two
hyphens is how typewritten manuscripts should represent an em-dash.
But we'd like to be able to represent an en-dash as well; 2-for-en and
3-for-em is convenient, but not universal.  Some people put spaces
around em-dashes --- like this --- and some don't---like this.
Typographically, the spaces are not correct and should be removed (at
least for common English usage---the mind boggles!).  Some people want
to distinguish em-dashes, but don't care about distinguishing between
en-dash & hyphen.

If we try to impose one set of conventions on all users, it will
inevitably conflict with someone's alternate conventions (not to
mention those who don't want any character processing at all!).  Even
if that is dismissed (reST is a markup language, after all), there are
variations in output requirements.

So these things would have to be options, and no, escaping doesn't cut
it.  Even options don't really cut it, because the processing is local
to the document, not the system on which it's being processed.  Pragma
directives would be ideal.

-- 
David Goodger <http://python.net/~goodger>

[Docutils-users] Dash-transformation (was: rendering ellipsis_)

From: Felix W. <Fel...@gm...> - 2004-10-21 15:21:12

With the current implementations, some documents are specifically
written for the LaTeX writer (because they rely on the
dash-transformation) and some are written specifically for the HTML
writer (because they rely on multiple dashes not to be transformed).

Considering the commonness of both en-/em-dashes and unix-style options,
it is indeed probable that this writer dependence exists for many
documents.

Furthermore, in LaTeX there are probably frequent false-positives,
because the dash-transformation is applied unconditionally, and it isn't
even escapable.

So we have a problem which needs to be solved.

After re-reading David's posting, I too finally came to the conclusion
that an intelligent guessing-algorithm might be a bad idea.

A somewhat radical but nonetheless simple and effective solution might
be to deactivate the transformation in the LaTeX writer.

However, then it should be possible to easily enter en-/em-dashes with
ASCII characters.

* I'd suggest adding built-in substitution definitions for "|--|" to
  en-dash and "|---|" to em-dash.

* And it would be necessary to write em-dashes without spaces around.

The latter thing doesn't work, however:

$ quicktest.py 
foo|---|
<document source="<stdin>">
    <paragraph>
        foo|---|
$ quicktest.py 
foo|---|bar
<stdin>:1: (WARNING/2) Inline substitution_reference start-string without end-string.
<document source="<stdin>">
    <paragraph>
        foo|---
        <problematic id="id2" refid="id1">
            |
        bar
    <system_message backrefs="id2" id="id1" level="2" line="1" source="<stdin>" type="WARNING">
        <paragraph>
            Inline substitution_reference start-string without end-string.

IMO the trailing space should be made omittable.  (I think it won't
cause any existing documents to break, because this change would only
turn invalid constructs into valid ones.)

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

Re: [Docutils-users] Dash-transformation

From: David G. <go...@py...> - 2004-10-27 20:11:20

Attachments: signature.asc

[Felix Wiemann]
 > With the current implementations, some documents are specifically
 > written for the LaTeX writer (because they rely on the
 > dash-transformation) and some are written specifically for the HTML
 > writer (because they rely on multiple dashes not to be transformed).

That's bad.

 > So we have a problem which needs to be solved.

Yes.  IMO, it's a bug that the LaTeX writer implicitly performs any
dash transformation at all.  It's a dangerous convenience.

 > A somewhat radical but nonetheless simple and effective solution
 > might be to deactivate the transformation in the LaTeX writer.

+1

 > However, then it should be possible to easily enter en-/em-dashes
 > with ASCII characters.
 >
 > * I'd suggest adding built-in substitution definitions for "|--|" to
 >   en-dash and "|---|" to em-dash.

I don't know about inserting a set of predefined substitution
definitions into the parser.  But we could certainly include a set of
substitution files in Docutils.  Then the author could do:

     .. include:: <dashes.txt>

See <http://docutils.sf.net/docs/dev/todo.html#misc.include>; more
below.

 > * And it would be necessary to write em-dashes without spaces around.

Are you saying that substitution references should not require any
delimiters?  That won't work.  Substitution references are like any
other reST inline markup; the start-string and end-string recognition
rules must apply in order to avoid ambiguity
(http://docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-markup).

This is the best we can do right now:

$ quicktest.py
foo\ |---|\ bar
<document source="<stdin>">
     <paragraph>
         foo
         <substitution_reference refname="---">
             ---
         bar

 > IMO the trailing space should be made omittable.

We'd still need a leading space.  With an omissible trailing space,
the best we'd be able to do would be

     foo\ |---|bar

That isn't much better than the current "foo\ |---|\ bar".  Certainly
not worth the ambiguity and effort.

But this gave me an idea.  In conjunction with a change to the
"unicode" directive, substitutions could become context-sensitive.  We
could add a "trim" option to the "unicode" directive, as follows:

     .. |--| unicode:: U+02013 .. EN DASH
        :trim:
     .. |---| unicode:: U+02014 .. EM DASH
        :trim:

Then this input:

     foo |---| bar

could become this output:

     foo&mdash;bar

And other characters can be used as markup delimiters, not just
spaces.  For example, hyphens can be used.  Alternative substitution
definitions I'm thinking of include:

     .. |M| unicode:: U+02014 .. EM DASH
        :trim: -
     .. |N| unicode:: U+02013 .. EN DASH
        :trim: -
     .. |?| unicode:: U+000AD .. SOFT HYPHEN
        :trim: -
     .. |!| unicode:: U+02011 .. NON-BREAKING HYPHEN
        :trim: -
     .. |#| unicode:: U+02012 .. FIGURE DASH
        :trim: -

So an em-dash could be written like this, similar to the proofreaders'
mark:

     foo-|M|-bar

and would produce (the equivalent of) this:

     foo&mdash;bar

Alternatively, XML entity names (|mdash|) could be used instead of the
cryptic symbols above (|M|).

Many space characters could also be defined:

     .. |emsp| unicode:: U+02003 .. EM SPACE
        :trim:
     .. |ensp| unicode:: U+02002 .. EN SPACE
        :trim:
     .. |puncsp| unicode:: U+02008 .. PUNCTUATION SPACE
        :trim:
     .. |numsp| unicode:: U+02007 .. DIGIT SPACE
        :trim:
     .. |thinsp| unicode:: U+02009 .. THIN SPACE
        :trim:
     .. |hairsp| unicode:: U+0200A .. HAIR SPACE
        :trim:
     .. |0sp| unicode:: U+0200B .. ZERO WIDTH SPACE
        :trim:
     .. |zwnj| unicode:: U+0200C .. ZERO WIDTH NON-JOINER
        :trim:
     .. |zwj| unicode:: U+0200D .. ZERO WIDTH JOINER
        :trim:
     .. |nbsp| unicode:: U+000A0 .. NO-BREAK SPACE
        :trim:

In fact, all of the character entity files in the add-on package
(http://docutils.sourceforge.net/tmp/charents.tgz, which should come
standard with Docutils) could have space-trimmed alternatives.

Discussion welcome.

-- 
David Goodger <http://python.net/~goodger>

[Docutils-users] Re: Dash-transformation

From: Felix W. <Fel...@gm...> - 2004-10-30 20:12:24

David Goodger wrote:

> Felix Wiemann wrote:
>
>> * I'd suggest adding built-in substitution definitions for "|--|" to
>>   en-dash and "|---|" to em-dash.
>
> I don't know about inserting a set of predefined substitution
> definitions into the parser.  But we could certainly include a set of
> substitution files in Docutils.  Then the author could do:
>
>      .. include:: <dashes.txt>

I'm not sure if the benefit is big enough enough to justify the effort
of adding such a feature and maintaining a set of 'standard'
substitution files.

Probably it's best to just require the document author to include his
own substitution file(s).

>> IMO the trailing space should be made omittable.
>
> We'd still need a leading space.

I thought the leading space was optional; seems I got the current syntax
wrong...  8-)

> But this gave me an idea.  In conjunction with a change to the
> "unicode" directive, substitutions could become context-sensitive.  We
> could add a "trim" option to the "unicode" directive, as follows:
>
>      .. |--| unicode:: U+02013 .. EN DASH
>         :trim:
>      .. |---| unicode:: U+02014 .. EM DASH
>         :trim:

Looks nice.

But what about multi-line unicode definitions?  Recognize the option iff
the last line is ':trim:'?  Not too elegant but it could work.

> And other characters can be used as markup delimiters, not just
> spaces.  For example, hyphens can be used.

I think that would be over-engineering.  We don't *really* need it, do
we?

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

Re: [Docutils-users] Re: Dash-transformation

From: David G. <go...@py...> - 2004-11-01 04:39:32

Attachments: signature.asc

[David Goodger]
 >> I don't know about inserting a set of predefined substitution
 >> definitions into the parser.  But we could certainly include a set
 >> of substitution files in Docutils.  Then the author could do:
 >>
 >>      .. include:: <dashes.txt>

[Felix Wiemann]
 > I'm not sure if the benefit is big enough enough to justify the
 > effort of adding such a feature and maintaining a set of 'standard'
 > substitution files.

I think it may be justified, although it doesn't have to be done right
away.  I'm -1 on adding any built-in substitution definitions; a set
of standard substitution definition files is the closest I'd agree to.

 > But what about multi-line unicode definitions?  Recognize the option
 > iff the last line is ':trim:'?

That's not an issue.  It's taken care of by the directive parsing
code.  I added a "trim" option to the "unicode" directive; it doesn't
do anything except set an attribute.  Here's the result:

$ quicktest.py <<EOF
.. |x| unicode:: U+0041
    U+0042
    :trim:

|x|
EOF
<document source="<stdin>">
     <substitution_definition name="x" trim="1">
         A
         B
     <paragraph>
         <substitution_reference refname="x">
             x

Note the 'trim="1"' in <substitution_definition ...>.

 >> And other characters can be used as markup delimiters, not just
 >> spaces.  For example, hyphens can be used.
 >
 > I think that would be over-engineering.  We don't *really* need it,
 > do we?

Perhaps not right away, but I anticipate it may become necessary if
the feature becomes popular.  I'd be happy just to add it to the to-do
list with a big "?", for now.

-- 
David Goodger <http://python.net/~goodger>

[Docutils-users] Re: Dash-transformation

From: Felix W. <Fel...@gm...> - 2004-11-08 19:40:25

David Goodger wrote:

> I just implemented new options for the "unicode" directive: "ltrim",
> "rtrim", and "trim" (trim whitespace from the left, right, or both
> sides of substitution references when applied).

Great; thank you.

I'm just wondering if it were a good idea to allow these options for all
directives (not only "unicode") when they occur in a substitution
definition.  Because recently could have used something like this:

    .. |,| raw:: latex
       :trim:

       \,

    .. "\," inserts a narrow space in LaTeX.

    This is a phone number:
    +12-34 |,| 56-7 |,| 89 |,| 01

    This is the same number without nice spaces in LaTeX:
    +12-3456-78901

    (Both numbers are rendered identically in HTML.)

I could imagine similar scenarios for images, and possibly also for
replacement text.

>>> And other characters can be used as markup delimiters, not just
>>> spaces.  For example, hyphens can be used.
>>
>> I think that would be over-engineering.  We don't *really* need it,
>> do we?
>
> Perhaps not right away, but I anticipate it may become necessary if
> the feature becomes popular.

On a second thought, it might be very useful indeed.  :)

    .. |--| unicode:: U+2013
       .. en-dash, trimming only hyphens, not spaces
       :trim: -

    This is an en-dash |--| as you would insert it in German and
    sometimes in English (mostly UK, I think).  And this is a range from
    50 to 100: 50-|--|-100; rendered as 50<endash>100, without spaces.

Syntax proposal:

    ":ltrim:" adds ltrim=" " as attribute; ":ltrim: -" adds ltrim="-";
    same for any other character.  It is not possible to activate
    trimming of multiple characters (e.g. both spaces and hyphens).

    Same for :rtrim: and :trim:.

What d'you think?  Useful or feature creep?

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

Re: [Docutils-users] Re: Dash-transformation

From: David G. <go...@py...> - 2004-11-10 04:11:24

Attachments: signature.asc

[Felix Wiemann]
 > I'm just wondering if it were a good idea to allow these options for
 > all directives (not only "unicode") when they occur in a
 > substitution definition.

Seems like a good idea to me.

[David Goodger]
 >>>> And other characters can be used as markup delimiters, not just
 >>>> spaces.  For example, hyphens can be used.
...
 > On a second thought, it might be very useful indeed.  :)
 >
 >     .. |--| unicode:: U+2013
 >        .. en-dash, trimming only hyphens, not spaces
 >        :trim: -
 >
 >     This is an en-dash |--| as you would insert it in German and
 >     sometimes in English (mostly UK, I think).  And this is a range
 >     from 50 to 100: 50-|--|-100; rendered as 50<endash>100, without
 >     spaces.
 >
 > Syntax proposal:
 >
 >     ":ltrim:" adds ltrim=" " as attribute; ":ltrim: -" adds
 >     ltrim="-"; same for any other character.  It is not possible to
 >     activate trimming of multiple characters (e.g. both spaces and
 >     hyphens).
 >
 >     Same for :rtrim: and :trim:.
 >
 > What d'you think?  Useful or feature creep?

Potentially useful.  The "trim" attributes would have to match the
context for the substitution to be applied.  And multiple contexts
would have to be supported.  So we'd have to support multiple
substitution definitions with the same substitution text but different
trim contexts.

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-users] Dash-transformation

From: Felix W. <Fel...@gm...> - 2004-11-10 18:10:11

David Goodger wrote:

> Felix Wiemann wrote:
>
>>     .. |--| unicode:: U+2013
>>        .. en-dash, trimming only hyphens, not spaces
>>        :trim: -
>
> Potentially useful.

Great.

> The "trim" attributes would have to match the context for the
> substitution to be applied.

Why?  Given the definition above, I can insert an en-dash with spaces
around |--| like this |--| and I can insert an en-dash without
spaces-|--|-by surrounding it with hyphens (which is sometimes needed,
too).  This is very handy, so in fact I'd rather want the "trim"
attribute *not* to have to match the context of the substitution
reference.

Supporting multiple substitution definitions only differing in their
trim-attributes would add a lot of unnecessary complexity, and I'm not
convinced at all that we are ever going to need it.

So I think we rather shouldn't make substitutions context-sensitive.

-- 
Felix Wiemann -- http://www.ososo.de/

Re: [Docutils-users] Dash-transformation

From: David G. <go...@py...> - 2004-11-11 03:56:01

Attachments: signature.asc

[David Goodger]
 >> The "trim" attributes would have to match the context for the
 >> substitution to be applied.

[Felix Wiemann]
 > Why?  Given the definition above, I can insert an en-dash with
 > spaces around |--| like this |--| and I can insert an en-dash
 > without spaces-|--|-by surrounding it with hyphens (which is
 > sometimes needed, too).

I misunderstood.  I was thinking about my previous proposal for
dashes, like:

     .. |M| unicode:: U+02014 .. EM DASH
        :trim: -

And similarly for spaces:

     .. |emsp| unicode:: U+02003 .. EM SPACE
        :trim:

I had originally thought of this for spaces:

     .. |M| unicode:: U+02003 .. EM SPACE
        :trim:

So "word-|M|-word" would result in an em-dash, and "word |M| word"
would result in an em-space.  The substitutions would be
context-sensitive.  Perhaps not that great of an idea.

But now that I do understand what you meant, I don't like it so much.
Needing to write extra hyphens in order not to get spaces around an
em-dash is ugly and a kludge.  Even target cases, like "50-|--|-100",
are ugly.  I'm thinking that ":trim: -" might not be such a good idea
after all.

-- 
David Goodger <http://python.net/~goodger>

[Docutils-users] Linux Installation Problem

From: <cj...@sy...> - 2004-11-11 20:07:12

In the .../site-packages/tools directory, I have the following command:
    ...  # buildhtml.py ../docs ../docs
and get the response below.  I tried the same thing in the archive 
directory, with a similar response.

How do I convert the basic .txt stuff to HTML?

Would it be possible to build this into the distutils activity?

Is it intended that the docultils package will be available as a Debian 
package later?

Thanks,

Colin W.

/// Processing directory: ../docs
    ::: Processing: index.txt
../docs/index.txt:0: (ERROR/3) Document empty; must have contents.
/// Processing directory: ../docs/api
    ::: Processing: cmdline-tool.txt
../docs/api/cmdline-tool.txt:0: (ERROR/3) Document empty; must have 
contents.
    ::: Processing: publisher.txt
../docs/api/publisher.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: runtime-settings.txt
../docs/api/runtime-settings.txt:0: (ERROR/3) Document empty; must have 
contents.
/// Processing directory: ../docs/dev
    ::: Processing: testing.txt
../docs/dev/testing.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: release.txt
../docs/dev/release.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: pysource.txt
../docs/dev/pysource.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: todo.txt
../docs/dev/todo.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: enthought-rfp.txt
../docs/dev/enthought-rfp.txt:0: (ERROR/3) Document empty; must have 
contents.
    ::: Processing: enthought-plan.txt
../docs/dev/enthought-plan.txt:0: (ERROR/3) Document empty; must have 
contents.
    ::: Processing: policies.txt
../docs/dev/policies.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: website.txt
../docs/dev/website.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: semantics.txt
../docs/dev/semantics.txt:0: (ERROR/3) Document empty; must have contents.
/// Processing directory: ../docs/dev/rst
    ::: Processing: problems.txt
../docs/dev/rst/problems.txt:0: (ERROR/3) Document empty; must have 
contents.
    ::: Processing: alternatives.txt
../docs/dev/rst/alternatives.txt:0: (ERROR/3) Document empty; must have 
contents.
/// Processing directory: ../docs/ref
    ::: Processing: transforms.txt
../docs/ref/transforms.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: doctree.txt
../docs/ref/doctree.txt:15: (ERROR/3) Error in "contents" directive:
invalid option data: extension option field body may contain
a single paragraph only (option "depth").

.. contents:: :depth: 1


../docs/ref/doctree.txt:222: (ERROR/3) Error in "contents" directive:
invalid option data: extension option field body may contain
a single paragraph only (option "depth").

.. contents:: :local:
              :depth: 1

../docs/ref/doctree.txt:4245: (ERROR/3) Error in "contents" directive:
invalid option data: extension option field body may contain
a single paragraph only (option "depth").

.. contents:: :local:
              :depth: 1

../docs/ref/doctree.txt:4495: (ERROR/3) Error in "contents" directive:
invalid option data: extension option field body may contain
a single paragraph only (option "depth").

.. contents:: :local:
              :depth: 1

../docs/ref/doctree.txt:0: (ERROR/3) Document empty; must have contents.
/// Processing directory: ../docs/ref/rst
    ::: Processing: restructuredtext.txt
../docs/ref/rst/restructuredtext.txt:0: (ERROR/3) Document empty; must 
have contents.
    ::: Processing: introduction.txt
../docs/ref/rst/introduction.txt:0: (ERROR/3) Document empty; must have 
contents.
    ::: Processing: roles.txt
../docs/ref/rst/roles.txt:0: (ERROR/3) Document empty; must have contents.
    ::: Processing: directives.txt
../docs/ref/rst/directives.txt:0: (ERROR/3) Document empty; must have 
contents.
/// Processing directory: ../docs/peps
    ::: Processing: pep-0256.txt
../docs/peps/pep-0256.txt:0: (ERROR/3) Document empty; must have contents.
DataError: Document does not begin with an RFC-2822 header; it is not a PEP.
Exiting due to error.  Use "--traceback" to diagnose.
Please report errors to <doc...@li...>.
Include "--traceback" output, Docutils version (0.3.5),
Python version (2.3.4), your OS type & version, and the
command line used.

Re: [Docutils-users] Linux Installation Problem

From: David G. <go...@py...> - 2004-11-12 14:19:51

Attachments: signature.asc

[cj...@sy...]
 > In the .../site-packages/tools directory, I have the following
 > command:
 >    ...  # buildhtml.py ../docs ../docs

You only need to specify "../docs" once.  Try that.

 > and get the response below.  I tried the same thing in the archive
 > directory, with a similar response.
...
[lots of errors like this:]
 > /// Processing directory: ../docs
 >    ::: Processing: index.txt
 > ../docs/index.txt:0: (ERROR/3) Document empty; must have contents.

*Are* these documents empty?  Perhaps the text encoding or line
endings have been altered.

How did you install Docutils?
 From what source?
Where are all the parts installed?
Do you have multiple copies installed?
Is PYTHONPATH set?  To what?

 > How do I convert the basic .txt stuff to HTML?

Use buildhtml.py to convert a directory full of .txt files.  Use
rst2html.py to convert one at a time.  See
<http://docutils.sf.net/docs/user/tools.html>.

 > Would it be possible to build this into the distutils activity?

Yes, it's possible.  If and when depends on volunteers.  Care to
contribute?

 > Is it intended that the docultils package will be available as a
 > Debian package later?

It is available now:

     apt-get install python-docutils

Current version is 0.3.3 in testing, and 0.3.5 in unstable.  Note that
the current CVS code is 0.3.6, with more features and fewer bugs
(IMHO) than releases.

-- 
David Goodger <http://python.net/~goodger>

Re: [Docutils-users] Dash-transformation

From: Beni C. <cb...@us...> - 2004-11-12 10:21:17

David Goodger wrote:
> [David Goodger]
>  >> The "trim" attributes would have to match the context for the
>  >> substitution to be applied.
> 
> [Felix Wiemann]
>  > Why?  Given the definition above, I can insert an en-dash with
>  > spaces around |--| like this |--| and I can insert an en-dash
>  > without spaces-|--|-by surrounding it with hyphens (which is
>  > sometimes needed, too).
> 
> I misunderstood.  I was thinking about my previous proposal for
> dashes, like:
> 
>     .. |M| unicode:: U+02014 .. EM DASH
>        :trim: -
> 
> And similarly for spaces:
> 
>     .. |emsp| unicode:: U+02003 .. EM SPACE
>        :trim:
> 
> I had originally thought of this for spaces:
> 
>     .. |M| unicode:: U+02003 .. EM SPACE
>        :trim:
> 
> So "word-|M|-word" would result in an em-dash, and "word |M| word"
> would result in an em-space.  

Yikes!@~  That's just way too subtle and only convenient for a few very 
marginal characters.

> The substitutions would be context-sensitive.  Perhaps not that great
> of an idea.
> 

-|M|-1

> But now that I do understand what you meant, I don't like it so much.
> Needing to write extra hyphens in order not to get spaces around an
> em-dash is ugly and a kludge.  Even target cases, like "50-|--|-100",
> are ugly.  I'm thinking that ":trim: -" might not be such a good idea
> after all.
> 
What if you do want a space?  Something *is* needed but ":trim: -" 
doesn't feel like the right thing.  Because what if you do want a hyphen?

-0

-- 
"Not just a none, but the None.  The definate article.
The alpha and omega, unchanging and unwilling to act."
--- Chris Cioffi against PEP 336 (Make None Callable).

[Docutils-users] Re: Rendering emdashes

From: Felix W. <Fel...@gm...> - 2004-10-18 19:20:22

Marcelo Huerta wrote:

> My intention was to say that, convenient as I might find the "---"
> shortcut (and I would really prefer "--", as it's our usual way to
> replace the em dash when writing a text file, en dashes being written
> simply as "-"),

No.  There is a difference between dash and en-dash (which is very
important for German, e.g.) which would be lost if en-dashes were
written as single dashes ("-").  "--" for en-dash and "---" for em-dash
is also the way LaTeX does it.

> I wonder how could be easily implemented to avoid inconvenience to a
> Spanish language writer.  [E.g. "---He reñido a un posadero."]

After some googling it looks like some people prefer using "foo --- bar"
instead of "foo---bar" in normal text, so there probably shouldn't be
any requirement about leading or trailing alphanumeric characters for
em-dashes anyway.  (I.e., there won't be any problem with your
dialog-example.)

Something I forgot in my earlier postings is that the en-dash may also
be needed for sequences ("pp. 15--18"), or compound expressions if one
of the components contains spaces ("post--World War 1").

So the transformation would probably look like this:

* Transform "---" to em-dash if it isn't preceded or followed by a dash.
* Transform "--" to en-dash if it's surrounded by whitespace or by
  alphanumeric characters.
* No transformation takes place if one of the dashes is escaped.

To give an example-implementation of what I mean:

------------------------------------------------------------------------------
--- docutils/parsers/rst/states.py.~1.78.~	2004-10-15 14:58:05.000000000 +0200
+++ docutils/parsers/rst/states.py	2004-10-18 21:00:12.000000000 +0200
@@ -483,7 +483,10 @@
         method, which enables additional interpreted text roles.
         """
 
-        self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),]
+        self.implicit_dispatch = [
+            (self.patterns.uri, self.standalone_uri),
+            (self.patterns.en_dash, lambda x, y: [nodes.Text(u'\u2013')]),
+            (self.patterns.em_dash, lambda x, y: [nodes.Text(u'\u2014')])]
         """List of (pattern, bound method) tuples, used by
         `self.implicit_inline`."""
 
@@ -680,7 +683,27 @@
                 r"""
                 %(start_string_prefix)s
                 (RFC(-|\s+)?(?P<rfcnum>\d+))
-                %(end_string_suffix)s""" % locals(), re.VERBOSE))
+                %(end_string_suffix)s""" % locals(), re.VERBOSE),
+          en_dash=re.compile(
+              r"""
+              (
+                (?<!\S)   # leading whitespace or nothing
+                --
+                (?=\s|\000[ \n]|$)   # trailing whitespace (possibly
+                                     # escaped) or nothing
+              |
+                (?<=\w)   # leading alphanumeric character
+                --
+                (?=\000?\w)   # trailing alphanumeric character,
+                              # possibly escaped
+              )
+              """, re.VERBOSE | re.UNICODE),
+          em_dash = re.compile(
+              r"""
+              (?<![\000-])   # No leading escape or dash.
+              ---
+              (?!-)   # No trailing dash.
+              """, re.VERBOSE))
 
     def quoted_start(self, match):
         """Return 1 if inline markup start-string is 'quoted', 0 if not."""
------------------------------------------------------------------------------

Some examples of what is transformed and what is not transformed (later,
we could also use this for testing if the patch is accepted):

Transformed Dashes
==================

En-dashes
---------

Foo -- bar, 10--20, foo--bar, foo\ --\ bar, foo--\bar.

-- at the beginning, at the end --

--

Em-dashes
---------

Foo --- bar, foo---bar, foo ---bar, foo--- bar, foo---\bar, -\ ---\ -,
foo/---bar, foo---/bar, foo---\\bar, foo\\---\\bar.

---at the beginning, at the end---

---

Untransformed Dashes
====================

En-dashes
---------

Foo-- bar, foo --bar, foo\--bar, foo-\-bar, foo--\ bar, foo/--bar,
foo--/bar, foo/--/bar, "--foo", "bar--", \\--foo, bar--\\.

--at the beginning, at the end--

Em-dashes
---------

Foo----bar, foo-----bar, foo\---bar, foo-\--bar, foo--\-bar.

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

[Docutils-users] Re: Rendering emdashes

From: Felix W. <Fel...@gm...> - 2004-10-19 00:38:58

Marcelo Huerta wrote:

> Felix Wiemann wrote:
>
>> (I.e., there won't be any problem with your dialog-example.)
>
> It would be problematic for the rendering. Inter-dialog observations
> are included in Spanish by inserting emdashes which *must* be in
> contact with the text in some part and separated in others; otherwise
> it's a syntactic error. For example:
>
> English version:
>
> "Yes,", he told me, "I must finish this work right now." I hated his
> stupid smile.
>
> Spanish version, -- instead of emdash:
>
> --Sí --me dijo él--, tengo que terminar este trabajo ya. --Odiaba su
> estúpida sonrisa.

In fact these dashes wouldn't be transformed, but they are all
en-dashes, not em-dashes.  I suppose you mean:

---Sí ---me dijo él---, tengo que terminar este trabajo ya. ---Odiaba su
estúpida sonrisa.

These dash-groups are all correctly transformed to em-dashes.

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

[Docutils-users] Re: Linux Installation Problem

From: Felix W. <Fel...@gm...> - 2004-11-13 14:15:02

"cj...@sy..." <cj...@sy...> wrote:

> In the .../site-packages/tools directory, I have the following
> command:

That's the wrong directory.  It should be docutils/tools/.

Hmm.  Are you sure you installed Docutils by running "python setup.py
install" and not by copying it to /usr/lib/python2.x/site-packages/?

-- 
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.

http://www.ososo.de/

[Docutils-users] Rendering emdashes (Was: Re: rendering ellipsis)

From: Marcelo H. <mg...@sp...> - 2004-10-18 03:53:16

El 17/10/2004 a las 18:57, Felix Wiemann <Fel...@gm...> dijo=
,
en su mensaje "[Docutils-users] Re: rendering ellipsis_":

> I never saw to two dashes surrounded by whitespace in any typeset t=
ext.
> I.e., I don't think something like that exists in reality.  Same fo=
r
> three dashes surrounded by non-whitespace.

> And if someone needs it, he can escape one of the dashes: "foo \-- =
bar",
> or "foo -\- bar", or "foo-\--bar", or "foo\---bar".

How do you would address the converse situation, meaning, you *need*
to convert "triple dash + nonspace" into "emdash + nonspace"? That's =
the
way dialogs are written in Spanish. It would be *extremely*
inconvenient to have to escape an space for each line of dialog, for
example.

 ---He re=F1ido a un posadero.
 ---=BFPor qu=E9? =BFCu=E1ndo? =BFD=F3nde? =BFC=F3mo?
 ---Porque cuando donde como sirven mal, me desespero.

--=20
                    o-=3D< Marcelo >=3D-o

caballo de tiro. Equino de kermesse.
  --Del "Bichonario" (Gim=E9nez/Wright)

Re: [Docutils-users] Rendering emdashes (Was: Re: rendering ellipsis)

From: David G. <go...@py...> - 2004-10-18 05:09:19

Attachments: signature.asc

[Marcelo Huerta]
> How do you would address the converse situation, meaning, you *need*
> to convert "triple dash + nonspace" into "emdash + nonspace"? That's the
> way dialogs are written in Spanish. It would be *extremely*
> inconvenient to have to escape an space for each line of dialog, for
> example.
> 
>  ---He reñido a un posadero.
>  ---¿Por qué? ¿Cuándo? ¿Dónde? ¿Cómo?
>  ---Porque cuando donde como sirven mal, me desespero.

Marcelo, could you please clarify: in Spanish, is dialogue written 
with three dashes, or with one em dash?

Thanks.

-- David Goodger

[Docutils-users] Rendering emdashes

From: Marcelo H. <mg...@sp...> - 2004-10-18 12:20:08

El 18/10/2004 a las 02:08, David Goodger <go...@py...> dijo, e=
n
su mensaje "[Docutils-users] Rendering emdashes (Was: Re: rendering
ellipsis)":

> Marcelo, could you please clarify: in Spanish, is dialogue written
> with three dashes, or with one em dash?

I meant an em dash, of course. Sorry for not being clear. My intentio=
n
was to say that, convenient as I might find the "---" shortcut (and I
would really prefer "--", as it's our usual way to replace the em das=
h
when writing a text file, en dashes being written simply as "-"), I
wonder how could be easily implemented to avoid inconvenience to a
Spanish language writer.

--=20
                    o-=3D< Marcelo >=3D-o

cacharro. Animal joven para contener l=EDquidos.
  --Del "Bichonario" (Gim=E9nez/Wright)

1 2 > >> (Page 1 of 2)