From: David G. <go...@py...> - 2004-10-27 20:11:20
|
[Felix Wiemann] > With the current implementations, some documents are specifically > written for the LaTeX writer (because they rely on the > dash-transformation) and some are written specifically for the HTML > writer (because they rely on multiple dashes not to be transformed). That's bad. > So we have a problem which needs to be solved. Yes. IMO, it's a bug that the LaTeX writer implicitly performs any dash transformation at all. It's a dangerous convenience. > A somewhat radical but nonetheless simple and effective solution > might be to deactivate the transformation in the LaTeX writer. +1 > However, then it should be possible to easily enter en-/em-dashes > with ASCII characters. > > * I'd suggest adding built-in substitution definitions for "|--|" to > en-dash and "|---|" to em-dash. I don't know about inserting a set of predefined substitution definitions into the parser. But we could certainly include a set of substitution files in Docutils. Then the author could do: .. include:: <dashes.txt> See <http://docutils.sf.net/docs/dev/todo.html#misc.include>; more below. > * And it would be necessary to write em-dashes without spaces around. Are you saying that substitution references should not require any delimiters? That won't work. Substitution references are like any other reST inline markup; the start-string and end-string recognition rules must apply in order to avoid ambiguity (http://docutils.sf.net/docs/ref/rst/restructuredtext.html#inline-markup). This is the best we can do right now: $ quicktest.py foo\ |---|\ bar <document source="<stdin>"> <paragraph> foo <substitution_reference refname="---"> --- bar > IMO the trailing space should be made omittable. We'd still need a leading space. With an omissible trailing space, the best we'd be able to do would be foo\ |---|bar That isn't much better than the current "foo\ |---|\ bar". Certainly not worth the ambiguity and effort. But this gave me an idea. In conjunction with a change to the "unicode" directive, substitutions could become context-sensitive. We could add a "trim" option to the "unicode" directive, as follows: .. |--| unicode:: U+02013 .. EN DASH :trim: .. |---| unicode:: U+02014 .. EM DASH :trim: Then this input: foo |---| bar could become this output: foo—bar And other characters can be used as markup delimiters, not just spaces. For example, hyphens can be used. Alternative substitution definitions I'm thinking of include: .. |M| unicode:: U+02014 .. EM DASH :trim: - .. |N| unicode:: U+02013 .. EN DASH :trim: - .. |?| unicode:: U+000AD .. SOFT HYPHEN :trim: - .. |!| unicode:: U+02011 .. NON-BREAKING HYPHEN :trim: - .. |#| unicode:: U+02012 .. FIGURE DASH :trim: - So an em-dash could be written like this, similar to the proofreaders' mark: foo-|M|-bar and would produce (the equivalent of) this: foo—bar Alternatively, XML entity names (|mdash|) could be used instead of the cryptic symbols above (|M|). Many space characters could also be defined: .. |emsp| unicode:: U+02003 .. EM SPACE :trim: .. |ensp| unicode:: U+02002 .. EN SPACE :trim: .. |puncsp| unicode:: U+02008 .. PUNCTUATION SPACE :trim: .. |numsp| unicode:: U+02007 .. DIGIT SPACE :trim: .. |thinsp| unicode:: U+02009 .. THIN SPACE :trim: .. |hairsp| unicode:: U+0200A .. HAIR SPACE :trim: .. |0sp| unicode:: U+0200B .. ZERO WIDTH SPACE :trim: .. |zwnj| unicode:: U+0200C .. ZERO WIDTH NON-JOINER :trim: .. |zwj| unicode:: U+0200D .. ZERO WIDTH JOINER :trim: .. |nbsp| unicode:: U+000A0 .. NO-BREAK SPACE :trim: In fact, all of the character entity files in the add-on package (http://docutils.sourceforge.net/tmp/charents.tgz, which should come standard with Docutils) could have space-trimmed alternatives. Discussion welcome. -- David Goodger <http://python.net/~goodger> |