From: Kjetil T. H. <kje...@if...> - 2004-10-06 20:19:50
|
in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML wants ``…``. it would be nice if Docutils did this transformation. .. _ellipsis: http://en.wikipedia.org/wiki/Ellipsis [#]_ .. [#] has anyone looked at fitting reST-parsing into Gnus, yet? :-) -- Kjetil T. |
From: David G. <go...@py...> - 2004-10-07 00:53:38
Attachments:
signature.asc
|
[Kjetil Torgrim Homme] > in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML > wants ``…``. it would be nice if Docutils did this > transformation. Docutils doesn't do text transformations like that (changing "..." to a single ellipsis character). For the reasons why, please see <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>. Question 2.7 of the FAQ (http://docutils.sourceforge.net/FAQ.html) directly addresses this issue. Short summary: enter the real ellipsis character, using UTF-8 or another encoding, or a workaround. The LaTeX writer already translates U+2026 to "\dots". Is that equivalent to "\ldots"? (I'm not a TeX expert.) -- David Goodger <http://python.net/~goodger> |
From: Kjetil T. H. <kje...@if...> - 2004-10-07 02:21:33
|
On ons, 2004-10-06 at 20:53 -0400, David Goodger wrote: > [Kjetil Torgrim Homme] > > in order to render ``...`` correctly, LaTeX wants ``\ldots``, and HTML > > wants ``…``. it would be nice if Docutils did this > > transformation. > > Docutils doesn't do text > transformations like that (changing "..." to a single ellipsis > character). For the reasons why, please see > <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>. ah. a text-replace directive looks like a very nice solution for this. I could then make a file with the transformations I like and include it in my files. I can't simply add a simplistic preprocessor (such as a sed script), since it doesn't know about reST syntax. it would mangle literals and usage such as Subsubsection ............. > The LaTeX writer already translates U+2026 to "\dots". Is that > equivalent to "\ldots"? (I'm not a TeX expert.) sorry, I got it wrong. it's called \ldots in a math environment, so \dots and $\ldots$ are equivalent. -- Kjetil T. |
From: Felix W. <Fel...@gm...> - 2004-10-16 20:30:04
|
David Goodger wrote: > Docutils doesn't do text transformations like that (changing "..." to > a single ellipsis character). For the reasons why, please see > <http://docutils.sf.net/docs/dev/rst/alternatives.html#character-processing>. It reads: | Docutils has no need of a character entity subsystem. Supporting | Unicode and text encodings, character entities should be directly | represented in the text: a copyright symbol should be represented by | the copyright symbol character. For the copyright sign, it's indeed a good idea to enter it directly. However, for some characters, the direct unicode-representation looks unnatural in plaintext. For example, normal dash, en-dash and em-dash are hardly distinguishable in a monospaced font. And a 'true' ellipsis would be rendered much too narrow in monospace. And even if it's possible to enter such characters, it is not intuitive. reStructuredText is often required to be edited by persons not familiar with the markup language. Such persons normally do not enter non-ASCII characters if there are existing ASCII characters (e.g., they would write "--" instead of an en-dash) and if they were told to enter unicode-symbols, they would find it extremely inconvenient (I do, too). In fact, it's very natural to write en-dashes as two normal dashes -- like this. Or maybe also em-dashes---like this. And you'd always write ellipses like this... Or like this ... The LaTeX writer already does the en-/em-dash transformation (because LaTeX automatically transforms '--' into a real en-dash and the LaTeX writer doesn't escape dashes), and I have been using them and found them quite convenient. However, sometimes this behavior is undesired, e.g. when typing options, like --stylesheet (without surrounding ``literal quotes``). An intelligent replacement mechanism in the reStructuredText parser would fix this problem, because it could transform "foo -- bar", but not "foo--bar" nor " --bar". And "foo---bar", but not "foo --- bar" (I think), and also not "-----" (sometimes people might use such repeated dashes e.g. to render arrows). Such an intelligent mechanism would greatly simplify inputting reStructuredText. For ellipses, I'm not entirely sure what to do. For HTML, the … ellipsis is often too narrow, but for LaTeX, it would be good to have "foo..." and "foo ..." both transformed to "foo\,\dots" ("foo", narrow-space, ellipsis). As this (narrow-space + ellipsis) does not lead to very good results for HTML, writer-dependent handling would probably be necessary if this were to be implemented in the reST parser. Thus I think it would be a better idea to implement ellipsis-support in the LaTeX writer, where it's actually necessary. If I find the time, I can post a patch. So I propose the following: * Add intelligent en-dash and em-dash transformation to the reST parser. * Add intelligent ellipsis transformation to the LaTeX writer. | If this is not possible in an authoring environment, a pre-processing | stage can be added, Not really. A pre- or post-processor cannot distinguish between literal (= monospaced) and normal text, just to name one problem. | or a table of substitution definitions can be devised. Substitutions are not very readable and need to be learned by human document writers. And after all, I don't see any disadvantages in adding an automatic transformation. -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-10-17 19:24:01
Attachments:
signature.asc
|
[Felix Wiemann] > For example, normal dash, en-dash and em-dash are hardly > distinguishable in a monospaced font. And a 'true' ellipsis would > be rendered much too narrow in monospace. Yes, there are limitations when using/requiring monospace typefaces. > And even if it's possible to enter such characters, it is not > intuitive. I hope it's becoming more possible and intuitive though. Perhaps I've been spoiled, being used to the easy non-ASCII input methods that Macs have had since the beginning. I'm now setting up a Debian system which will be my main desktop; I'll find out the current situation there. > and if they were told to enter unicode-symbols, they would find it > extremely inconvenient (I do, too). I think that's a failing of the operating systems people use. The world is moving toward internationalization, with Unicode and UTF-8 at the forefront. I hope it gets here soon so we can ignore the issue completely. > In fact, it's very natural to write en-dashes as two normal dashes > -- like this. Or maybe also em-dashes---like this. And you'd > always write ellipses like this... Or like this ... You illustrate a problem: there is no one standard for such transformations. And there's no way to distinguish between "transformation desired" and "leave as-is" in normal text. Such ambiguity is the reason why I decided to ignore the issue. It's a "refuse the temptation to guess" situation. > The LaTeX writer already does the en-/em-dash transformation > (because LaTeX automatically transforms '--' into a real en-dash and > the LaTeX writer doesn't escape dashes), and I have been using them > and found them quite convenient. I question whether the LaTeX writer *should* be doing this. At the least it should be an option, disabled by default. > However, sometimes this behavior is undesired, e.g. when typing > options, like --stylesheet (without surrounding ``literal quotes``). Exactly. > An intelligent replacement mechanism in the reStructuredText parser > would fix this problem, because it could transform "foo -- bar", but > not "foo--bar" nor " --bar". And "foo---bar", but not "foo --- bar" > (I think), and also not "-----" (sometimes people might use such > repeated dashes e.g. to render arrows). > > Such an intelligent mechanism would greatly simplify inputting > reStructuredText. Again, any such system should be optional, and disabled by default. > So I propose the following: > > * Add intelligent en-dash and em-dash transformation to the reST > parser. > * Add intelligent ellipsis transformation to the LaTeX writer. You may be opening up a big can of worms. Once the underlying system is there, won't there be a bunch of requests for (potentially conflicting) additions? When will it stop? > | If this is not possible in an authoring environment, a > | pre-processing stage can be added, > > Not really. A pre- or post-processor cannot distinguish between > literal (= monospaced) and normal text, just to name one problem. That's true. > I don't see any disadvantages in adding an automatic transformation. I do, because it won't do what I want 100% of the time. It has to be optional. -- David Goodger <http://python.net/~goodger> |
From: Aleksey G. <agu...@me...> - 2004-10-17 21:32:52
|
David Goodger writes: > [Felix Wiemann] > > I don't see any disadvantages in adding an automatic transformation. > > I do, because it won't do what I want 100% of the time. It has to be > optional. Nothing in RST does "what I want" 100% of the time. Yet it works out nicely as long as the mismatch is more often the exception than the rule, and there is a way to override the default behavior. IMO intelligent '--' to em-dash / '...' to ellipsis substitution would be no worse than, let's say, current hyperlink/text substitution rules. -- Aleksey Gurtovoy MetaCommunications Engineering |
From: Felix W. <Fel...@gm...> - 2004-10-17 21:56:43
|
David Goodger wrote: > Felix Wiemann wrote: > >> And even if it's possible to enter such characters, it is not >> intuitive. >> >> [...] and if they were told to enter unicode-symbols, they would find >> it extremely inconvenient (I do, too). > > I think that's a failing of the operating systems people use. Possibly, even though I personally rarely have the need to enter non-ASCII characters, especially ones outside latin1. My system is still latin1-based (mostly because of my laziness), so things are becoming quite messy if I start entering texts in UTF-8 (besides the fact that it's difficult for me to enter arbitrary Unicode characters). Sure it's my fault, but the need for UTF-8 (and similar cool internationalized things) isn't urgent enough to make me solve the problem. I presume I'm not the only one. > The world is moving toward internationalization, with Unicode and > UTF-8 at the forefront. Which doesn't mean that it gets easier to enter seldom-needed characters like en-dashes in plain text. > I hope it gets here soon so we can ignore the issue completely. Not too soon, if at all. Practicability counts, IMO. Docutils should be useful *now*, not in three years. >> In fact, it's very natural to write en-dashes as two normal dashes >> -- like this. Or maybe also em-dashes---like this. And you'd >> always write ellipses like this... Or like this ... > > You illustrate a problem: there is no one standard for such > transformations. And there's no way to distinguish between > "transformation desired" and "leave as-is" in normal text. I never saw to two dashes surrounded by whitespace in any typeset text. I.e., I don't think something like that exists in reality. Same for three dashes surrounded by non-whitespace. And if someone needs it, he can escape one of the dashes: "foo \-- bar", or "foo -\- bar", or "foo-\--bar", or "foo\---bar". >> The LaTeX writer already does the en-/em-dash transformation >> (because LaTeX automatically transforms '--' into a real en-dash and >> the LaTeX writer doesn't escape dashes), and I have been using them >> and found them quite convenient. > > I question whether the LaTeX writer *should* be doing this. It shouldn't. We'd get more flexibility if the reST parser did it instead of the LaTeX writer, because it's currently impossible to escape anything for the LaTeX writer, but for the reST parser it is possible. I.e., "foo -\- bar" currently does not escape the dashes, but if the reST parser did the transformation, it would work. Furthermore, documents should not be written for one particular writer, but if the dash-feature is only supported by the LaTeX writer, this is exactly what happens. (I have such documents which rely on "--" being transformed to en-dash.) > At the least it should be an option, disabled by default. I don't think that's necessary, as it's possible to escape dashes. >> However, sometimes this behavior is undesired, e.g. when typing >> options, like --stylesheet (without surrounding ``literal quotes``). > > Exactly. Simple cases like --stylesheet would be caught by a little bit of intelligence (read: a proper regex), because there is no whitespace after the two dashes. >> So I propose the following: >> >> * Add intelligent en-dash and em-dash transformation to the reST >> parser. >> * Add intelligent ellipsis transformation to the LaTeX writer. > > You may be opening up a big can of worms. Once the underlying system > is there, won't there be a bunch of requests for (potentially > conflicting) additions? When will it stop? The fact that there is one such a transformation doesn't mean we will be adding anything, because there are IMO compelling reasons for dash-transformation (which are: monospace rendering, enterability and intuitiveness, usualness in plain texts, and rather high unambiguousness), which don't exist for the other transformations proposed in alternatives.txt. Concerning the ellipsis, I'm not entirely sure if it's a good idea to implement it, because it is not *that* important, sometimes the transformation may be undesirable for some languages or style-conventions, and it's not possible to escape special-cases because the logic would be implemented in the LaTeX writer. So we probably rather shouldn't do the ellipsis transformation as long as these problems aren't solved. >> I don't see any disadvantages in adding an automatic transformation. > > I do, because it won't do what I want 100% of the time. It has to be > optional. With sufficient intelligence, it will do in 99,9%. (I can't remember any case where the effect would have been undesired.) Is escapability sufficient optionalness for you? -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-10-19 15:03:45
Attachments:
signature.asc
|
[David Goodger] >> You may be opening up a big can of worms. Once the underlying >> system is there, won't there be a bunch of requests for >> (potentially conflicting) additions? When will it stop? [Felix Wiemann] > The fact that there is one such a transformation doesn't mean we > will be adding anything, because there are IMO compelling reasons > for dash-transformation ... > which don't exist for the other transformations proposed in > alternatives.txt. Compelling arguments could be put forth for any number of other transformations. docs/dev/rst/alternatives.txt doesn't list all possible transformations, just a sampling. I still think that once we start down this path, it will be difficult to limit the uses of character processing. It will become a full-blown subsystem. We must be cautious. > (which are: monospace rendering, enterability and intuitiveness, > usualness in plain texts, and rather high unambiguousness), Yes, those are compelling. I'll change my vote to +0 (but read on). > Concerning the ellipsis, I'm not entirely sure if it's a good idea > to implement it, because it is not *that* important, sometimes the > transformation may be undesirable for some languages or > style-conventions, Seems to me that *any* text transformation may be undesirable to somebody, somewhere, sometime. > and it's not possible to escape special-cases because the logic > would be implemented in the LaTeX writer. Why wouldn't the logic for ellipsis be in the parser? > Is escapability sufficient optionalness for you? No. That adds an extra burden for those people who *don't* want the feature. Better to make it a normally-disabled "power user" option (or multiple options). Then there's an expectation that the user will know what they're getting into. I say multiple options because there is no standard way to represent the various dashes. Some people use two hyphens for an em-dash (--), some three (---). According to `The Chicago Manual of Style`, two hyphens is how typewritten manuscripts should represent an em-dash. But we'd like to be able to represent an en-dash as well; 2-for-en and 3-for-em is convenient, but not universal. Some people put spaces around em-dashes --- like this --- and some don't---like this. Typographically, the spaces are not correct and should be removed (at least for common English usage---the mind boggles!). Some people want to distinguish em-dashes, but don't care about distinguishing between en-dash & hyphen. If we try to impose one set of conventions on all users, it will inevitably conflict with someone's alternate conventions (not to mention those who don't want any character processing at all!). Even if that is dismissed (reST is a markup language, after all), there are variations in output requirements. So these things would have to be options, and no, escaping doesn't cut it. Even options don't really cut it, because the processing is local to the document, not the system on which it's being processed. Pragma directives would be ideal. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2004-10-21 15:21:12
|
With the current implementations, some documents are specifically written for the LaTeX writer (because they rely on the dash-transformation) and some are written specifically for the HTML writer (because they rely on multiple dashes not to be transformed). Considering the commonness of both en-/em-dashes and unix-style options, it is indeed probable that this writer dependence exists for many documents. Furthermore, in LaTeX there are probably frequent false-positives, because the dash-transformation is applied unconditionally, and it isn't even escapable. So we have a problem which needs to be solved. After re-reading David's posting, I too finally came to the conclusion that an intelligent guessing-algorithm might be a bad idea. A somewhat radical but nonetheless simple and effective solution might be to deactivate the transformation in the LaTeX writer. However, then it should be possible to easily enter en-/em-dashes with ASCII characters. * I'd suggest adding built-in substitution definitions for "|--|" to en-dash and "|---|" to em-dash. * And it would be necessary to write em-dashes without spaces around. The latter thing doesn't work, however: $ quicktest.py foo|---| <document source="<stdin>"> <paragraph> foo|---| $ quicktest.py foo|---|bar <stdin>:1: (WARNING/2) Inline substitution_reference start-string without end-string. <document source="<stdin>"> <paragraph> foo|--- <problematic id="id2" refid="id1"> | bar <system_message backrefs="id2" id="id1" level="2" line="1" source="<stdin>" type="WARNING"> <paragraph> Inline substitution_reference start-string without end-string. IMO the trailing space should be made omittable. (I think it won't cause any existing documents to break, because this change would only turn invalid constructs into valid ones.) -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-10-27 20:11:20
Attachments:
signature.asc
|
[Felix Wiemann] > With the current implementations, some documents are specifically > written for the LaTeX writer (because they rely on the > dash-transformation) and some are written specifically for the HTML > writer (because they rely on multiple dashes not to be transformed). That's bad. > So we have a problem which needs to be solved. Yes. IMO, it's a bug that the LaTeX writer implicitly performs any dash transformation at all. It's a dangerous convenience. > A somewhat radical but nonetheless simple and effective solution > might be to deactivate the transformation in the LaTeX writer. +1 > However, then it should be possible to easily enter en-/em-dashes > with ASCII characters. > > * I'd suggest adding built-in substitution definitions for "|--|" to > en-dash and "|---|" to em-dash. I don't know about inserting a set of predefined substitution definitions into the parser. But we could certainly include a set of substitution files in Docutils. Then the author could do: .. include:: <dashes.txt> See <http://docutils.sf.net/docs/dev/todo.html#misc.include>; more below. > * And it would be necessary to write em-dashes without spaces around. Are you saying that substitution references should not require any delimiters? That won't work. Substitution references are like any other reST inline markup; the start-string and end-string recognition rules must apply in order to avoid ambiguity (http://docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-markup). This is the best we can do right now: $ quicktest.py foo\ |---|\ bar <document source="<stdin>"> <paragraph> foo <substitution_reference refname="---"> --- bar > IMO the trailing space should be made omittable. We'd still need a leading space. With an omissible trailing space, the best we'd be able to do would be foo\ |---|bar That isn't much better than the current "foo\ |---|\ bar". Certainly not worth the ambiguity and effort. But this gave me an idea. In conjunction with a change to the "unicode" directive, substitutions could become context-sensitive. We could add a "trim" option to the "unicode" directive, as follows: .. |--| unicode:: U+02013 .. EN DASH :trim: .. |---| unicode:: U+02014 .. EM DASH :trim: Then this input: foo |---| bar could become this output: foo—bar And other characters can be used as markup delimiters, not just spaces. For example, hyphens can be used. Alternative substitution definitions I'm thinking of include: .. |M| unicode:: U+02014 .. EM DASH :trim: - .. |N| unicode:: U+02013 .. EN DASH :trim: - .. |?| unicode:: U+000AD .. SOFT HYPHEN :trim: - .. |!| unicode:: U+02011 .. NON-BREAKING HYPHEN :trim: - .. |#| unicode:: U+02012 .. FIGURE DASH :trim: - So an em-dash could be written like this, similar to the proofreaders' mark: foo-|M|-bar and would produce (the equivalent of) this: foo—bar Alternatively, XML entity names (|mdash|) could be used instead of the cryptic symbols above (|M|). Many space characters could also be defined: .. |emsp| unicode:: U+02003 .. EM SPACE :trim: .. |ensp| unicode:: U+02002 .. EN SPACE :trim: .. |puncsp| unicode:: U+02008 .. PUNCTUATION SPACE :trim: .. |numsp| unicode:: U+02007 .. DIGIT SPACE :trim: .. |thinsp| unicode:: U+02009 .. THIN SPACE :trim: .. |hairsp| unicode:: U+0200A .. HAIR SPACE :trim: .. |0sp| unicode:: U+0200B .. ZERO WIDTH SPACE :trim: .. |zwnj| unicode:: U+0200C .. ZERO WIDTH NON-JOINER :trim: .. |zwj| unicode:: U+0200D .. ZERO WIDTH JOINER :trim: .. |nbsp| unicode:: U+000A0 .. NO-BREAK SPACE :trim: In fact, all of the character entity files in the add-on package (http://docutils.sourceforge.net/tmp/charents.tgz, which should come standard with Docutils) could have space-trimmed alternatives. Discussion welcome. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2004-10-30 20:12:24
|
David Goodger wrote: > Felix Wiemann wrote: > >> * I'd suggest adding built-in substitution definitions for "|--|" to >> en-dash and "|---|" to em-dash. > > I don't know about inserting a set of predefined substitution > definitions into the parser. But we could certainly include a set of > substitution files in Docutils. Then the author could do: > > .. include:: <dashes.txt> I'm not sure if the benefit is big enough enough to justify the effort of adding such a feature and maintaining a set of 'standard' substitution files. Probably it's best to just require the document author to include his own substitution file(s). >> IMO the trailing space should be made omittable. > > We'd still need a leading space. I thought the leading space was optional; seems I got the current syntax wrong... 8-) > But this gave me an idea. In conjunction with a change to the > "unicode" directive, substitutions could become context-sensitive. We > could add a "trim" option to the "unicode" directive, as follows: > > .. |--| unicode:: U+02013 .. EN DASH > :trim: > .. |---| unicode:: U+02014 .. EM DASH > :trim: Looks nice. But what about multi-line unicode definitions? Recognize the option iff the last line is ':trim:'? Not too elegant but it could work. > And other characters can be used as markup delimiters, not just > spaces. For example, hyphens can be used. I think that would be over-engineering. We don't *really* need it, do we? -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-11-01 04:39:32
Attachments:
signature.asc
|
[David Goodger] >> I don't know about inserting a set of predefined substitution >> definitions into the parser. But we could certainly include a set >> of substitution files in Docutils. Then the author could do: >> >> .. include:: <dashes.txt> [Felix Wiemann] > I'm not sure if the benefit is big enough enough to justify the > effort of adding such a feature and maintaining a set of 'standard' > substitution files. I think it may be justified, although it doesn't have to be done right away. I'm -1 on adding any built-in substitution definitions; a set of standard substitution definition files is the closest I'd agree to. > But what about multi-line unicode definitions? Recognize the option > iff the last line is ':trim:'? That's not an issue. It's taken care of by the directive parsing code. I added a "trim" option to the "unicode" directive; it doesn't do anything except set an attribute. Here's the result: $ quicktest.py <<EOF .. |x| unicode:: U+0041 U+0042 :trim: |x| EOF <document source="<stdin>"> <substitution_definition name="x" trim="1"> A B <paragraph> <substitution_reference refname="x"> x Note the 'trim="1"' in <substitution_definition ...>. >> And other characters can be used as markup delimiters, not just >> spaces. For example, hyphens can be used. > > I think that would be over-engineering. We don't *really* need it, > do we? Perhaps not right away, but I anticipate it may become necessary if the feature becomes popular. I'd be happy just to add it to the to-do list with a big "?", for now. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2004-11-08 19:40:25
|
David Goodger wrote: > I just implemented new options for the "unicode" directive: "ltrim", > "rtrim", and "trim" (trim whitespace from the left, right, or both > sides of substitution references when applied). Great; thank you. I'm just wondering if it were a good idea to allow these options for all directives (not only "unicode") when they occur in a substitution definition. Because recently could have used something like this: .. |,| raw:: latex :trim: \, .. "\," inserts a narrow space in LaTeX. This is a phone number: +12-34 |,| 56-7 |,| 89 |,| 01 This is the same number without nice spaces in LaTeX: +12-3456-78901 (Both numbers are rendered identically in HTML.) I could imagine similar scenarios for images, and possibly also for replacement text. >>> And other characters can be used as markup delimiters, not just >>> spaces. For example, hyphens can be used. >> >> I think that would be over-engineering. We don't *really* need it, >> do we? > > Perhaps not right away, but I anticipate it may become necessary if > the feature becomes popular. On a second thought, it might be very useful indeed. :) .. |--| unicode:: U+2013 .. en-dash, trimming only hyphens, not spaces :trim: - This is an en-dash |--| as you would insert it in German and sometimes in English (mostly UK, I think). And this is a range from 50 to 100: 50-|--|-100; rendered as 50<endash>100, without spaces. Syntax proposal: ":ltrim:" adds ltrim=" " as attribute; ":ltrim: -" adds ltrim="-"; same for any other character. It is not possible to activate trimming of multiple characters (e.g. both spaces and hyphens). Same for :rtrim: and :trim:. What d'you think? Useful or feature creep? -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-11-10 04:11:24
Attachments:
signature.asc
|
[Felix Wiemann] > I'm just wondering if it were a good idea to allow these options for > all directives (not only "unicode") when they occur in a > substitution definition. Seems like a good idea to me. [David Goodger] >>>> And other characters can be used as markup delimiters, not just >>>> spaces. For example, hyphens can be used. ... > On a second thought, it might be very useful indeed. :) > > .. |--| unicode:: U+2013 > .. en-dash, trimming only hyphens, not spaces > :trim: - > > This is an en-dash |--| as you would insert it in German and > sometimes in English (mostly UK, I think). And this is a range > from 50 to 100: 50-|--|-100; rendered as 50<endash>100, without > spaces. > > Syntax proposal: > > ":ltrim:" adds ltrim=" " as attribute; ":ltrim: -" adds > ltrim="-"; same for any other character. It is not possible to > activate trimming of multiple characters (e.g. both spaces and > hyphens). > > Same for :rtrim: and :trim:. > > What d'you think? Useful or feature creep? Potentially useful. The "trim" attributes would have to match the context for the substitution to be applied. And multiple contexts would have to be supported. So we'd have to support multiple substitution definitions with the same substitution text but different trim contexts. -- David Goodger <http://python.net/~goodger> |
From: Felix W. <Fel...@gm...> - 2004-11-10 18:10:11
|
David Goodger wrote: > Felix Wiemann wrote: > >> .. |--| unicode:: U+2013 >> .. en-dash, trimming only hyphens, not spaces >> :trim: - > > Potentially useful. Great. > The "trim" attributes would have to match the context for the > substitution to be applied. Why? Given the definition above, I can insert an en-dash with spaces around |--| like this |--| and I can insert an en-dash without spaces-|--|-by surrounding it with hyphens (which is sometimes needed, too). This is very handy, so in fact I'd rather want the "trim" attribute *not* to have to match the context of the substitution reference. Supporting multiple substitution definitions only differing in their trim-attributes would add a lot of unnecessary complexity, and I'm not convinced at all that we are ever going to need it. So I think we rather shouldn't make substitutions context-sensitive. -- Felix Wiemann -- http://www.ososo.de/ |
From: David G. <go...@py...> - 2004-11-11 03:56:01
Attachments:
signature.asc
|
[David Goodger] >> The "trim" attributes would have to match the context for the >> substitution to be applied. [Felix Wiemann] > Why? Given the definition above, I can insert an en-dash with > spaces around |--| like this |--| and I can insert an en-dash > without spaces-|--|-by surrounding it with hyphens (which is > sometimes needed, too). I misunderstood. I was thinking about my previous proposal for dashes, like: .. |M| unicode:: U+02014 .. EM DASH :trim: - And similarly for spaces: .. |emsp| unicode:: U+02003 .. EM SPACE :trim: I had originally thought of this for spaces: .. |M| unicode:: U+02003 .. EM SPACE :trim: So "word-|M|-word" would result in an em-dash, and "word |M| word" would result in an em-space. The substitutions would be context-sensitive. Perhaps not that great of an idea. But now that I do understand what you meant, I don't like it so much. Needing to write extra hyphens in order not to get spaces around an em-dash is ugly and a kludge. Even target cases, like "50-|--|-100", are ugly. I'm thinking that ":trim: -" might not be such a good idea after all. -- David Goodger <http://python.net/~goodger> |
From: <cj...@sy...> - 2004-11-11 20:07:12
|
In the .../site-packages/tools directory, I have the following command: ... # buildhtml.py ../docs ../docs and get the response below. I tried the same thing in the archive directory, with a similar response. How do I convert the basic .txt stuff to HTML? Would it be possible to build this into the distutils activity? Is it intended that the docultils package will be available as a Debian package later? Thanks, Colin W. /// Processing directory: ../docs ::: Processing: index.txt ../docs/index.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/api ::: Processing: cmdline-tool.txt ../docs/api/cmdline-tool.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: publisher.txt ../docs/api/publisher.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: runtime-settings.txt ../docs/api/runtime-settings.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/dev ::: Processing: testing.txt ../docs/dev/testing.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: release.txt ../docs/dev/release.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: pysource.txt ../docs/dev/pysource.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: todo.txt ../docs/dev/todo.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: enthought-rfp.txt ../docs/dev/enthought-rfp.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: enthought-plan.txt ../docs/dev/enthought-plan.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: policies.txt ../docs/dev/policies.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: website.txt ../docs/dev/website.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: semantics.txt ../docs/dev/semantics.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/dev/rst ::: Processing: problems.txt ../docs/dev/rst/problems.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: alternatives.txt ../docs/dev/rst/alternatives.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/ref ::: Processing: transforms.txt ../docs/ref/transforms.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: doctree.txt ../docs/ref/doctree.txt:15: (ERROR/3) Error in "contents" directive: invalid option data: extension option field body may contain a single paragraph only (option "depth"). .. contents:: :depth: 1 ../docs/ref/doctree.txt:222: (ERROR/3) Error in "contents" directive: invalid option data: extension option field body may contain a single paragraph only (option "depth"). .. contents:: :local: :depth: 1 ../docs/ref/doctree.txt:4245: (ERROR/3) Error in "contents" directive: invalid option data: extension option field body may contain a single paragraph only (option "depth"). .. contents:: :local: :depth: 1 ../docs/ref/doctree.txt:4495: (ERROR/3) Error in "contents" directive: invalid option data: extension option field body may contain a single paragraph only (option "depth"). .. contents:: :local: :depth: 1 ../docs/ref/doctree.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/ref/rst ::: Processing: restructuredtext.txt ../docs/ref/rst/restructuredtext.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: introduction.txt ../docs/ref/rst/introduction.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: roles.txt ../docs/ref/rst/roles.txt:0: (ERROR/3) Document empty; must have contents. ::: Processing: directives.txt ../docs/ref/rst/directives.txt:0: (ERROR/3) Document empty; must have contents. /// Processing directory: ../docs/peps ::: Processing: pep-0256.txt ../docs/peps/pep-0256.txt:0: (ERROR/3) Document empty; must have contents. DataError: Document does not begin with an RFC-2822 header; it is not a PEP. Exiting due to error. Use "--traceback" to diagnose. Please report errors to <doc...@li...>. Include "--traceback" output, Docutils version (0.3.5), Python version (2.3.4), your OS type & version, and the command line used. |
From: David G. <go...@py...> - 2004-11-12 14:19:51
Attachments:
signature.asc
|
[cj...@sy...] > In the .../site-packages/tools directory, I have the following > command: > ... # buildhtml.py ../docs ../docs You only need to specify "../docs" once. Try that. > and get the response below. I tried the same thing in the archive > directory, with a similar response. ... [lots of errors like this:] > /// Processing directory: ../docs > ::: Processing: index.txt > ../docs/index.txt:0: (ERROR/3) Document empty; must have contents. *Are* these documents empty? Perhaps the text encoding or line endings have been altered. How did you install Docutils? From what source? Where are all the parts installed? Do you have multiple copies installed? Is PYTHONPATH set? To what? > How do I convert the basic .txt stuff to HTML? Use buildhtml.py to convert a directory full of .txt files. Use rst2html.py to convert one at a time. See <http://docutils.sf.net/docs/user/tools.html>. > Would it be possible to build this into the distutils activity? Yes, it's possible. If and when depends on volunteers. Care to contribute? > Is it intended that the docultils package will be available as a > Debian package later? It is available now: apt-get install python-docutils Current version is 0.3.3 in testing, and 0.3.5 in unstable. Note that the current CVS code is 0.3.6, with more features and fewer bugs (IMHO) than releases. -- David Goodger <http://python.net/~goodger> |
From: Beni C. <cb...@us...> - 2004-11-12 10:21:17
|
David Goodger wrote: > [David Goodger] > >> The "trim" attributes would have to match the context for the > >> substitution to be applied. > > [Felix Wiemann] > > Why? Given the definition above, I can insert an en-dash with > > spaces around |--| like this |--| and I can insert an en-dash > > without spaces-|--|-by surrounding it with hyphens (which is > > sometimes needed, too). > > I misunderstood. I was thinking about my previous proposal for > dashes, like: > > .. |M| unicode:: U+02014 .. EM DASH > :trim: - > > And similarly for spaces: > > .. |emsp| unicode:: U+02003 .. EM SPACE > :trim: > > I had originally thought of this for spaces: > > .. |M| unicode:: U+02003 .. EM SPACE > :trim: > > So "word-|M|-word" would result in an em-dash, and "word |M| word" > would result in an em-space. Yikes!@~ That's just way too subtle and only convenient for a few very marginal characters. > The substitutions would be context-sensitive. Perhaps not that great > of an idea. > -|M|-1 > But now that I do understand what you meant, I don't like it so much. > Needing to write extra hyphens in order not to get spaces around an > em-dash is ugly and a kludge. Even target cases, like "50-|--|-100", > are ugly. I'm thinking that ":trim: -" might not be such a good idea > after all. > What if you do want a space? Something *is* needed but ":trim: -" doesn't feel like the right thing. Because what if you do want a hyphen? -0 -- "Not just a none, but the None. The definate article. The alpha and omega, unchanging and unwilling to act." --- Chris Cioffi against PEP 336 (Make None Callable). |
From: Felix W. <Fel...@gm...> - 2004-10-18 19:20:22
|
Marcelo Huerta wrote: > My intention was to say that, convenient as I might find the "---" > shortcut (and I would really prefer "--", as it's our usual way to > replace the em dash when writing a text file, en dashes being written > simply as "-"), No. There is a difference between dash and en-dash (which is very important for German, e.g.) which would be lost if en-dashes were written as single dashes ("-"). "--" for en-dash and "---" for em-dash is also the way LaTeX does it. > I wonder how could be easily implemented to avoid inconvenience to a > Spanish language writer. [E.g. "---He reñido a un posadero."] After some googling it looks like some people prefer using "foo --- bar" instead of "foo---bar" in normal text, so there probably shouldn't be any requirement about leading or trailing alphanumeric characters for em-dashes anyway. (I.e., there won't be any problem with your dialog-example.) Something I forgot in my earlier postings is that the en-dash may also be needed for sequences ("pp. 15--18"), or compound expressions if one of the components contains spaces ("post--World War 1"). So the transformation would probably look like this: * Transform "---" to em-dash if it isn't preceded or followed by a dash. * Transform "--" to en-dash if it's surrounded by whitespace or by alphanumeric characters. * No transformation takes place if one of the dashes is escaped. To give an example-implementation of what I mean: ------------------------------------------------------------------------------ --- docutils/parsers/rst/states.py.~1.78.~ 2004-10-15 14:58:05.000000000 +0200 +++ docutils/parsers/rst/states.py 2004-10-18 21:00:12.000000000 +0200 @@ -483,7 +483,10 @@ method, which enables additional interpreted text roles. """ - self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),] + self.implicit_dispatch = [ + (self.patterns.uri, self.standalone_uri), + (self.patterns.en_dash, lambda x, y: [nodes.Text(u'\u2013')]), + (self.patterns.em_dash, lambda x, y: [nodes.Text(u'\u2014')])] """List of (pattern, bound method) tuples, used by `self.implicit_inline`.""" @@ -680,7 +683,27 @@ r""" %(start_string_prefix)s (RFC(-|\s+)?(?P<rfcnum>\d+)) - %(end_string_suffix)s""" % locals(), re.VERBOSE)) + %(end_string_suffix)s""" % locals(), re.VERBOSE), + en_dash=re.compile( + r""" + ( + (?<!\S) # leading whitespace or nothing + -- + (?=\s|\000[ \n]|$) # trailing whitespace (possibly + # escaped) or nothing + | + (?<=\w) # leading alphanumeric character + -- + (?=\000?\w) # trailing alphanumeric character, + # possibly escaped + ) + """, re.VERBOSE | re.UNICODE), + em_dash = re.compile( + r""" + (?<![\000-]) # No leading escape or dash. + --- + (?!-) # No trailing dash. + """, re.VERBOSE)) def quoted_start(self, match): """Return 1 if inline markup start-string is 'quoted', 0 if not.""" ------------------------------------------------------------------------------ Some examples of what is transformed and what is not transformed (later, we could also use this for testing if the patch is accepted): Transformed Dashes ================== En-dashes --------- Foo -- bar, 10--20, foo--bar, foo\ --\ bar, foo--\bar. -- at the beginning, at the end -- -- Em-dashes --------- Foo --- bar, foo---bar, foo ---bar, foo--- bar, foo---\bar, -\ ---\ -, foo/---bar, foo---/bar, foo---\\bar, foo\\---\\bar. ---at the beginning, at the end--- --- Untransformed Dashes ==================== En-dashes --------- Foo-- bar, foo --bar, foo\--bar, foo-\-bar, foo--\ bar, foo/--bar, foo--/bar, foo/--/bar, "--foo", "bar--", \\--foo, bar--\\. --at the beginning, at the end-- Em-dashes --------- Foo----bar, foo-----bar, foo\---bar, foo-\--bar, foo--\-bar. -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: Felix W. <Fel...@gm...> - 2004-10-19 00:38:58
|
Marcelo Huerta wrote: > Felix Wiemann wrote: > >> (I.e., there won't be any problem with your dialog-example.) > > It would be problematic for the rendering. Inter-dialog observations > are included in Spanish by inserting emdashes which *must* be in > contact with the text in some part and separated in others; otherwise > it's a syntactic error. For example: > > English version: > > "Yes,", he told me, "I must finish this work right now." I hated his > stupid smile. > > Spanish version, -- instead of emdash: > > --Sí --me dijo él--, tengo que terminar este trabajo ya. --Odiaba su > estúpida sonrisa. In fact these dashes wouldn't be transformed, but they are all en-dashes, not em-dashes. I suppose you mean: ---Sí ---me dijo él---, tengo que terminar este trabajo ya. ---Odiaba su estúpida sonrisa. These dash-groups are all correctly transformed to em-dashes. -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: Felix W. <Fel...@gm...> - 2004-11-13 14:15:02
|
"cj...@sy..." <cj...@sy...> wrote: > In the .../site-packages/tools directory, I have the following > command: That's the wrong directory. It should be docutils/tools/. Hmm. Are you sure you installed Docutils by running "python setup.py install" and not by copying it to /usr/lib/python2.x/site-packages/? -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |
From: Marcelo H. <mg...@sp...> - 2004-10-18 03:53:16
|
El 17/10/2004 a las 18:57, Felix Wiemann <Fel...@gm...> dijo= , en su mensaje "[Docutils-users] Re: rendering ellipsis_": > I never saw to two dashes surrounded by whitespace in any typeset t= ext. > I.e., I don't think something like that exists in reality. Same fo= r > three dashes surrounded by non-whitespace. > And if someone needs it, he can escape one of the dashes: "foo \-- = bar", > or "foo -\- bar", or "foo-\--bar", or "foo\---bar". How do you would address the converse situation, meaning, you *need* to convert "triple dash + nonspace" into "emdash + nonspace"? That's = the way dialogs are written in Spanish. It would be *extremely* inconvenient to have to escape an space for each line of dialog, for example. ---He re=F1ido a un posadero. ---=BFPor qu=E9? =BFCu=E1ndo? =BFD=F3nde? =BFC=F3mo? ---Porque cuando donde como sirven mal, me desespero. --=20 o-=3D< Marcelo >=3D-o caballo de tiro. Equino de kermesse. --Del "Bichonario" (Gim=E9nez/Wright) |
From: David G. <go...@py...> - 2004-10-18 05:09:19
Attachments:
signature.asc
|
[Marcelo Huerta] > How do you would address the converse situation, meaning, you *need* > to convert "triple dash + nonspace" into "emdash + nonspace"? That's the > way dialogs are written in Spanish. It would be *extremely* > inconvenient to have to escape an space for each line of dialog, for > example. > > ---He reñido a un posadero. > ---¿Por qué? ¿Cuándo? ¿Dónde? ¿Cómo? > ---Porque cuando donde como sirven mal, me desespero. Marcelo, could you please clarify: in Spanish, is dialogue written with three dashes, or with one em dash? Thanks. -- David Goodger |
From: Marcelo H. <mg...@sp...> - 2004-10-18 12:20:08
|
El 18/10/2004 a las 02:08, David Goodger <go...@py...> dijo, e= n su mensaje "[Docutils-users] Rendering emdashes (Was: Re: rendering ellipsis)": > Marcelo, could you please clarify: in Spanish, is dialogue written > with three dashes, or with one em dash? I meant an em dash, of course. Sorry for not being clear. My intentio= n was to say that, convenient as I might find the "---" shortcut (and I would really prefer "--", as it's our usual way to replace the em das= h when writing a text file, en dashes being written simply as "-"), I wonder how could be easily implemented to avoid inconvenience to a Spanish language writer. --=20 o-=3D< Marcelo >=3D-o cacharro. Animal joven para contener l=EDquidos. --Del "Bichonario" (Gim=E9nez/Wright) |