From: Felix W. <Fel...@gm...> - 2004-10-18 19:20:22
|
Marcelo Huerta wrote: > My intention was to say that, convenient as I might find the "---" > shortcut (and I would really prefer "--", as it's our usual way to > replace the em dash when writing a text file, en dashes being written > simply as "-"), No. There is a difference between dash and en-dash (which is very important for German, e.g.) which would be lost if en-dashes were written as single dashes ("-"). "--" for en-dash and "---" for em-dash is also the way LaTeX does it. > I wonder how could be easily implemented to avoid inconvenience to a > Spanish language writer. [E.g. "---He reñido a un posadero."] After some googling it looks like some people prefer using "foo --- bar" instead of "foo---bar" in normal text, so there probably shouldn't be any requirement about leading or trailing alphanumeric characters for em-dashes anyway. (I.e., there won't be any problem with your dialog-example.) Something I forgot in my earlier postings is that the en-dash may also be needed for sequences ("pp. 15--18"), or compound expressions if one of the components contains spaces ("post--World War 1"). So the transformation would probably look like this: * Transform "---" to em-dash if it isn't preceded or followed by a dash. * Transform "--" to en-dash if it's surrounded by whitespace or by alphanumeric characters. * No transformation takes place if one of the dashes is escaped. To give an example-implementation of what I mean: ------------------------------------------------------------------------------ --- docutils/parsers/rst/states.py.~1.78.~ 2004-10-15 14:58:05.000000000 +0200 +++ docutils/parsers/rst/states.py 2004-10-18 21:00:12.000000000 +0200 @@ -483,7 +483,10 @@ method, which enables additional interpreted text roles. """ - self.implicit_dispatch = [(self.patterns.uri, self.standalone_uri),] + self.implicit_dispatch = [ + (self.patterns.uri, self.standalone_uri), + (self.patterns.en_dash, lambda x, y: [nodes.Text(u'\u2013')]), + (self.patterns.em_dash, lambda x, y: [nodes.Text(u'\u2014')])] """List of (pattern, bound method) tuples, used by `self.implicit_inline`.""" @@ -680,7 +683,27 @@ r""" %(start_string_prefix)s (RFC(-|\s+)?(?P<rfcnum>\d+)) - %(end_string_suffix)s""" % locals(), re.VERBOSE)) + %(end_string_suffix)s""" % locals(), re.VERBOSE), + en_dash=re.compile( + r""" + ( + (?<!\S) # leading whitespace or nothing + -- + (?=\s|\000[ \n]|$) # trailing whitespace (possibly + # escaped) or nothing + | + (?<=\w) # leading alphanumeric character + -- + (?=\000?\w) # trailing alphanumeric character, + # possibly escaped + ) + """, re.VERBOSE | re.UNICODE), + em_dash = re.compile( + r""" + (?<![\000-]) # No leading escape or dash. + --- + (?!-) # No trailing dash. + """, re.VERBOSE)) def quoted_start(self, match): """Return 1 if inline markup start-string is 'quoted', 0 if not.""" ------------------------------------------------------------------------------ Some examples of what is transformed and what is not transformed (later, we could also use this for testing if the patch is accepted): Transformed Dashes ================== En-dashes --------- Foo -- bar, 10--20, foo--bar, foo\ --\ bar, foo--\bar. -- at the beginning, at the end -- -- Em-dashes --------- Foo --- bar, foo---bar, foo ---bar, foo--- bar, foo---\bar, -\ ---\ -, foo/---bar, foo---/bar, foo---\\bar, foo\\---\\bar. ---at the beginning, at the end--- --- Untransformed Dashes ==================== En-dashes --------- Foo-- bar, foo --bar, foo\--bar, foo-\-bar, foo--\ bar, foo/--bar, foo--/bar, foo/--/bar, "--foo", "bar--", \\--foo, bar--\\. --at the beginning, at the end-- Em-dashes --------- Foo----bar, foo-----bar, foo\---bar, foo-\--bar, foo--\-bar. -- When replying to my email address, please ensure that the mail header contains 'Felix Wiemann'. http://www.ososo.de/ |