[Docutils-checkins] SF.net SVN: docutils:[8886] trunk/docutils

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Revision: 8886
          http://sourceforge.net/p/docutils/code/8886
Author:   milde
Date:     2021-11-14 22:00:50 +0000 (Sun, 14 Nov 2021)
Log Message:
-----------
Update documentation and handling of (east-asian) wide characters.

Explain the special casing for combinging and wide characters in
section titles.

cf. bug #433

Simplify import (we don't need to care for missing
`unicodedata.east_asian_width` in Python < 2.4 any longer).

Modified Paths:
--------------
    trunk/docutils/docs/ref/rst/restructuredtext.txt
    trunk/docutils/docutils/statemachine.py
    trunk/docutils/docutils/utils/__init__.py
    trunk/docutils/test/test_parsers/test_rst/test_east_asian_text.py

Modified: trunk/docutils/docs/ref/rst/restructuredtext.txt
===================================================================

--- trunk/docutils/docs/ref/rst/restructuredtext.txt	2021-11-11 16:29:16 UTC (rev 8885)
+++ trunk/docutils/docs/ref/rst/restructuredtext.txt	2021-11-14 22:00:50 UTC (rev 8886)
@@ -495,7 +495,7 @@
 matching "overlines" above the title.  An underline/overline is a
 single repeated punctuation character that begins in column 1 and
 forms a line extending at least as far as the right edge of the title
-text.  Specifically, an underline/overline character may be any
+text. [#]_  Specifically, an underline/overline character may be any
 non-alphanumeric printable 7-bit ASCII character [#]_.  When an
 overline is used, the length and character used must match the
 underline.  Underline-only adornment styles are distinct from
@@ -503,6 +503,10 @@
 be any number of levels of section titles, although some output
 formats may have limits (HTML has 6 levels).
 
+.. [#] The key is the visual length of the title in a mono-spaced font.
+   The adornment may need more or less characters than title, if the
+   title contains wide__ or combining__ characters.
+
 .. [#] The following are all valid section title adornment
    characters::
 
@@ -513,6 +517,9 @@
 
        = - ` : . ' " ~ ^ _ * + #
 
+__ https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms#In_Unicode
+__ https://en.wikipedia.org/wiki/Combining_character
+
 Rather than imposing a fixed number and order of section title
 adornment styles, the order enforced will be the order as encountered.
 The first style encountered will be an outermost title (like HTML H1),
@@ -1020,8 +1027,8 @@
 Doctree elements: option_list_, option_list_item_, option_group_, option_,
 option_string_, option_argument_, description_.
 
-Option lists are two-column lists of command-line options and
-descriptions, documenting a program's options.  For example::
+Option lists map a program's command-line options to descriptions
+documenting them.  For example::
 
     -a         Output all.
     -b         Output both (this description is

Modified: trunk/docutils/docutils/statemachine.py
===================================================================
--- trunk/docutils/docutils/statemachine.py	2021-11-11 16:29:16 UTC (rev 8885)
+++ trunk/docutils/docutils/statemachine.py	2021-11-14 22:00:50 UTC (rev 8886)
@@ -109,7 +109,8 @@
 
 import sys
 import re
-import unicodedata
+from unicodedata import east_asian_width
+
 from docutils import utils
 from docutils.utils.error_reporting import ErrorOutput
 
@@ -1446,7 +1447,6 @@
         Pad all double-width characters in self by appending `pad_char` to each.
         For East Asian language support.
         """
-        east_asian_width = unicodedata.east_asian_width
         for i in range(len(self.data)):
             line = self.data[i]
             if isinstance(line, unicode):

Modified: trunk/docutils/docutils/utils/__init__.py
===================================================================
--- trunk/docutils/docutils/utils/__init__.py	2021-11-11 16:29:16 UTC (rev 8885)
+++ trunk/docutils/docutils/utils/__init__.py	2021-11-14 22:00:50 UTC (rev 8886)
@@ -641,7 +641,7 @@
     Correct ``len(text)`` for wide East Asian and combining Unicode chars.
     """
     if isinstance(text, str) and sys.version_info < (3, 0):
-        return len(text)
+        return len(text) # shortcut for binary strings
     width = sum([east_asian_widths[unicodedata.east_asian_width(c)]
                  for c in text])
     # correction for combining chars:

Modified: trunk/docutils/test/test_parsers/test_rst/test_east_asian_text.py
===================================================================
--- trunk/docutils/test/test_parsers/test_rst/test_east_asian_text.py	2021-11-11 16:29:16 UTC (rev 8885)
+++ trunk/docutils/test/test_parsers/test_rst/test_east_asian_text.py	2021-11-14 22:00:50 UTC (rev 8886)
@@ -14,14 +14,9 @@
     import __init__
 from test_parsers import DocutilsTestSupport
 
-import unicodedata
+from unicodedata import east_asian_width
 
-try:
-    east_asian_width = unicodedata.east_asian_width
-except AttributeError:
-    east_asian_width = None
 
-
 def suite():
     s = DocutilsTestSupport.ParserTestSuite()
     s.generateTests(totest)

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.