[Docutils-checkins] SF.net SVN: docutils:[9077] trunk/docutils

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Revision: 9077
          http://sourceforge.net/p/docutils/code/9077
Author:   milde
Date:     2022-06-17 11:31:28 +0000 (Fri, 17 Jun 2022)
Log Message:
-----------
Documentation update

Remove dead link and outdated footnote about limitations in Python2.
Add link to acceptable values of encoding error handlers.

Harmonise help output.

Use UTF-8 in prose text, error messages, and documentation.
Use 'utf-8' in code or when referring to code.

Modified Paths:
--------------
    trunk/docutils/FAQ.txt
    trunk/docutils/docs/ref/rst/directives.txt
    trunk/docutils/docs/user/config.txt
    trunk/docutils/docutils/frontend.py
    trunk/docutils/test/data/help/docutils.txt
    trunk/docutils/test/test_functional.py
    trunk/docutils/test/test_io.py

Modified: trunk/docutils/FAQ.txt
===================================================================

--- trunk/docutils/FAQ.txt	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/FAQ.txt	2022-06-17 11:31:28 UTC (rev 9077)
@@ -1099,12 +1099,9 @@
 * `Python Unicode Tutorial
   <http://www.reportlab.com/i18n/python_unicode_tutorial.html>`_
 
-* `Python Unicode Objects: Some Observations on Working With Non-ASCII
-  Character Sets <http://effbot.org/zone/unicode-objects.htm>`_
-
 The common case is with the default output encoding (UTF-8), when
 either numbered sections are used (via the "sectnum_" directive) or
-symbol-footnotes.  3 non-breaking spaces are inserted in each numbered
+symbol-footnotes.  Three non-breaking spaces are inserted in each numbered
 section title, between the generated number and the title text.  Most
 footnote symbols are not available in ASCII, nor are non-breaking
 spaces.  When encoded with UTF-8 and viewed with ordinary ASCII tools,
@@ -1111,14 +1108,12 @@
 these characters will appear to be multi-character garbage.
 
 You may have an decoding problem in your browser (or editor, etc.).
-The encoding of the output is set to "utf-8", but your browser isn't
+The encoding of the output is set to UTF-8, but your browser isn't
 recognizing that.  You can either try to fix your browser (enable
 "UTF-8 character set", sometimes called "Unicode"), or choose a
-different encoding for the HTML output.  You can also try
+different `output-encoding`_.  You can also try
 ``--output-encoding=ascii:xmlcharrefreplace`` for HTML or XML, but not
-applicable to non-XMLish outputs (if using runtime
-settings/configuration files, use ``output_encoding=ascii`` and
-``output_encoding_error_handler=xmlcharrefreplace``).
+applicable to non-XMLish outputs.
 
 If you're generating document fragments, the "Content-Type" metadata
 (between the HTML ``<head>`` and ``</head>`` tags) must agree with the
@@ -1132,6 +1127,7 @@
     <?xml version="1.0" encoding="utf-8" ?>
 
 .. _sectnum: docs/ref/rst/directives.html#sectnum
+.. _output-encoding: docs/user/config.html#output-encoding
 
 
 How can I retrieve the body of the HTML document?

Modified: trunk/docutils/docs/ref/rst/directives.txt
===================================================================
--- trunk/docutils/docs/ref/rst/directives.txt	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docs/ref/rst/directives.txt	2022-06-17 11:31:28 UTC (rev 9077)
@@ -884,7 +884,7 @@
     The horizontal alignment of the table. (New in Docutils 0.13)
 
 ``delim`` : char | "tab" | "space" [#whitespace-delim]_
-    A one-character string\ [#ASCII-char]_ used to separate fields.
+    A one-character string used to separate fields.
     Defaults to ``,`` (comma).  May be specified as a Unicode code
     point; see the unicode_ directive for syntax details.
 
@@ -893,7 +893,7 @@
     Defaults to the document's input_encoding_.
 
 ``escape`` : char
-    A one-character\ [#ASCII-char]_ string used to escape the
+    A one-character string used to escape the
     delimiter or quote characters.  May be specified as a Unicode
     code point; see the unicode_ directive for syntax details.  Used
     when the delimiter is used in an unquoted field, or when quote
@@ -920,7 +920,7 @@
     significant.  The default is to ignore such whitespace.
 
 ``quote`` : char
-    A one-character string\ [#ASCII-char]_ used to quote elements
+    A one-character string used to quote elements
     containing the delimiter or which start with the quote
     character.  Defaults to ``"`` (quote).  May be specified as a
     Unicode code point; see the unicode_ directive for syntax
@@ -950,13 +950,7 @@
 .. [#whitespace-delim] Whitespace delimiters are supported only for external
    CSV files.
 
-.. [#ASCII-char] With Python 2, the values for the ``delimiter``,
-   ``quote``, and ``escape`` options must be ASCII characters. (The csv
-   module does not support Unicode and all non-ASCII characters are
-   encoded as multi-byte utf-8 string). This limitation does not exist
-   under Python 3.
 
-
 List Table
 ==========
 

Modified: trunk/docutils/docs/user/config.txt
===================================================================
--- trunk/docutils/docs/user/config.txt	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docs/user/config.txt	2022-06-17 11:31:28 UTC (rev 9077)
@@ -298,8 +298,9 @@
 error_encoding_error_handler
 ----------------------------
 
-The error handler for unencodable characters in error output.  See
-output_encoding_error_handler_ for acceptable values.
+The error handler for unencodable characters in error output.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module.
+See also output_encoding_error_handler_.
 
 Default: "backslashreplace"
 Options: ``--error-encoding-error-handler, --error-encoding, -e``.
@@ -373,8 +374,9 @@
 input_encoding_error_handler
 ----------------------------
 
-The error handler for undecodable characters in the input. Acceptable
-values include:
+The error handler for undecodable characters in the input.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module,
+including:
 
 strict
     Raise an exception in case of an encoding error.
@@ -384,10 +386,6 @@
 ignore
     Ignore malformed data and continue without further notice.
 
-Acceptable values are the same as for the "error" parameter of
-Python's ``unicode`` function; other values may be defined in
-applications or in future versions of Python.
-
 Default: "strict".
 Options: ``--input-encoding-error-handler, --input-encoding, -i``.
 
@@ -421,13 +419,14 @@
 
 The text encoding for output.
 
-Default: "UTF-8".  Options: ``--output-encoding, -o``.
+Default: "utf-8".  Options: ``--output-encoding, -o``.
 
 output_encoding_error_handler
 -----------------------------
 
-The error handler for unencodable characters in the output. Acceptable
-values include:
+The error handler for unencodable characters in the output.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module,
+including:
 
 strict
     Raise an exception in case of an encoding error.
@@ -442,10 +441,6 @@
 backslashreplace
     Replace with backslash escape sequences, such as "``\u2020``".
 
-Acceptable values are the same as for the "error" parameter of
-Python's ``encode`` string method; other values may be defined in
-applications or in future versions of Python.
-
 Default: "strict".
 Options: ``--output-encoding-error-handler, --output-encoding, -o``.
 
@@ -455,7 +450,7 @@
 Path to a file where Docutils will write a list of files that were
 required to generate the output, e.g. included files or embedded
 stylesheets [#dependencies]_. [#pwd]_ The format is one path per
-line with forward slashes as separator, the encoding is ``utf8``.
+line with forward slashes as separator, the encoding is UTF-8.
 
 Set to ``-`` in order to write dependencies to stdout.
 
@@ -465,10 +460,10 @@
   ham.html: ham.txt $(shell cat hamdeps.txt)
     rst2html.py --record-dependencies=hamdeps.txt ham.txt ham.html
 
-If the filesystem encoding differs from utf8, replace the ``cat``
+If the filesystem encoding differs from UTF-8, replace the ``cat``
 command with a call to a converter, e.g.::
 
-  $(shell iconv -f utf8 -t latin1 hamdeps.txt)
+  $(shell iconv -f utf-8 -t latin1 hamdeps.txt)
 
 Default: None.  Option: ``--record-dependencies``.
 
@@ -2336,3 +2331,6 @@
 .. _option lists: ../ref/rst/restructuredtext.html#option-lists
 .. _tables: ../ref/rst/restructuredtext.html#tables
 .. _table of contents: ../ref/rst/directives.html#contents
+
+.. _Error Handlers: 
+   https://docs.python.org/3/library/codecs.html#error-handlers

Modified: trunk/docutils/docutils/frontend.py
===================================================================
--- trunk/docutils/docutils/frontend.py	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docutils/frontend.py	2022-06-17 11:31:28 UTC (rev 9077)
@@ -562,7 +562,7 @@
          ('Disable Python tracebacks.  (default)',
           ['--no-traceback'], {'dest': 'traceback', 'action': 'store_false'}),
          ('Specify the encoding and optionally the '
-          'error handler of input text.  Default: <locale-dependent>:strict.',
+          'error handler of input text.  Default: <auto-detect>:strict.',
           ['--input-encoding', '-i'],
           {'metavar': '<name[:handler]>',
            'validator': validate_encoding_and_error_handler}),
@@ -580,8 +580,8 @@
           '"xmlcharrefreplace", "backslashreplace".',
           ['--output-encoding-error-handler'],
           {'default': 'strict', 'validator': validate_encoding_error_handler}),
-         ('Specify text encoding and error handler for error output.  '
-          'Default: %s:%s.'
+         ('Specify text encoding and optionally error handler '
+          'for error output.  Default: %s:%s.'
           % (default_error_encoding, default_error_encoding_error_handler),
           ['--error-encoding', '-e'],
           {'metavar': '<name[:handler]>', 'default': default_error_encoding,

Modified: trunk/docutils/test/data/help/docutils.txt
===================================================================
--- trunk/docutils/test/data/help/docutils.txt	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/data/help/docutils.txt	2022-06-17 11:31:28 UTC (rev 9077)
@@ -55,7 +55,7 @@
 --no-traceback          Disable Python tracebacks.  (default)
 --input-encoding=<name[:handler]>, -i <name[:handler]>
                         Specify the encoding and optionally the error handler
-                        of input text.  Default: <locale-dependent>:strict.
+                        of input text.  Default: <auto-detect>:strict.
 --input-encoding-error-handler=INPUT_ENCODING_ERROR_HANDLER
                         Specify the error handler for undecodable characters.
                         Choices: "strict" (default), "ignore", and "replace".
@@ -67,8 +67,8 @@
                         characters; "strict" (default), "ignore", "replace",
                         "xmlcharrefreplace", "backslashreplace".
 --error-encoding=<name[:handler]>, -e <name[:handler]>
-                        Specify text encoding and error handler for error
-                        output.  Default: utf-8:backslashreplace.
+                        Specify text encoding and optionally error handler for
+                        error output.  Default: utf-8:backslashreplace.
 --error-encoding-error-handler=ERROR_ENCODING_ERROR_HANDLER
                         Specify the error handler for unencodable characters
                         in error output.  Default: backslashreplace.

Modified: trunk/docutils/test/test_functional.py
===================================================================
--- trunk/docutils/test/test_functional.py	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/test_functional.py	2022-06-17 11:31:28 UTC (rev 9077)
@@ -157,7 +157,7 @@
         no_expected = self.no_expected_template % {
             'exp': expected_path, 'out': params['destination_path']}
         self.assertTrue(os.access(expected_path, os.R_OK), no_expected)
-        # samples are UTF8 encoded. 'rb' leads to errors with Python 3!
+        # samples are UTF-8 encoded. 'rb' leads to errors with Python 3!
         f = open(expected_path, 'r', encoding='utf-8')
         # Normalize line endings:
         expected = '\n'.join(f.read().splitlines())

Modified: trunk/docutils/test/test_io.py
===================================================================
--- trunk/docutils/test/test_io.py	2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/test_io.py	2022-06-17 11:31:28 UTC (rev 9077)
@@ -130,8 +130,8 @@
         self.assertEqual(data, ['Some include text.\n'])
 
     def test_heuristics_no_utf8(self):
-        # if no encoding is given and decoding with utf-8 fails,
-        # use either the locale encoding (if specified) or latin-1:
+        # if no encoding is given and decoding with 'utf-8' fails,
+        # use either the locale encoding (if specified) or 'latin-1':
         if io.locale_encoding not in ('utf-8', 'utf8'):
             # in Py3k, the locale encoding is used without --input-encoding
             # skipping the heuristic unless decoding fails.

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.