|
From: <mi...@us...> - 2022-06-17 11:31:31
|
Revision: 9077
http://sourceforge.net/p/docutils/code/9077
Author: milde
Date: 2022-06-17 11:31:28 +0000 (Fri, 17 Jun 2022)
Log Message:
-----------
Documentation update
Remove dead link and outdated footnote about limitations in Python2.
Add link to acceptable values of encoding error handlers.
Harmonise help output.
Use UTF-8 in prose text, error messages, and documentation.
Use 'utf-8' in code or when referring to code.
Modified Paths:
--------------
trunk/docutils/FAQ.txt
trunk/docutils/docs/ref/rst/directives.txt
trunk/docutils/docs/user/config.txt
trunk/docutils/docutils/frontend.py
trunk/docutils/test/data/help/docutils.txt
trunk/docutils/test/test_functional.py
trunk/docutils/test/test_io.py
Modified: trunk/docutils/FAQ.txt
===================================================================
--- trunk/docutils/FAQ.txt 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/FAQ.txt 2022-06-17 11:31:28 UTC (rev 9077)
@@ -1099,12 +1099,9 @@
* `Python Unicode Tutorial
<http://www.reportlab.com/i18n/python_unicode_tutorial.html>`_
-* `Python Unicode Objects: Some Observations on Working With Non-ASCII
- Character Sets <http://effbot.org/zone/unicode-objects.htm>`_
-
The common case is with the default output encoding (UTF-8), when
either numbered sections are used (via the "sectnum_" directive) or
-symbol-footnotes. 3 non-breaking spaces are inserted in each numbered
+symbol-footnotes. Three non-breaking spaces are inserted in each numbered
section title, between the generated number and the title text. Most
footnote symbols are not available in ASCII, nor are non-breaking
spaces. When encoded with UTF-8 and viewed with ordinary ASCII tools,
@@ -1111,14 +1108,12 @@
these characters will appear to be multi-character garbage.
You may have an decoding problem in your browser (or editor, etc.).
-The encoding of the output is set to "utf-8", but your browser isn't
+The encoding of the output is set to UTF-8, but your browser isn't
recognizing that. You can either try to fix your browser (enable
"UTF-8 character set", sometimes called "Unicode"), or choose a
-different encoding for the HTML output. You can also try
+different `output-encoding`_. You can also try
``--output-encoding=ascii:xmlcharrefreplace`` for HTML or XML, but not
-applicable to non-XMLish outputs (if using runtime
-settings/configuration files, use ``output_encoding=ascii`` and
-``output_encoding_error_handler=xmlcharrefreplace``).
+applicable to non-XMLish outputs.
If you're generating document fragments, the "Content-Type" metadata
(between the HTML ``<head>`` and ``</head>`` tags) must agree with the
@@ -1132,6 +1127,7 @@
<?xml version="1.0" encoding="utf-8" ?>
.. _sectnum: docs/ref/rst/directives.html#sectnum
+.. _output-encoding: docs/user/config.html#output-encoding
How can I retrieve the body of the HTML document?
Modified: trunk/docutils/docs/ref/rst/directives.txt
===================================================================
--- trunk/docutils/docs/ref/rst/directives.txt 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docs/ref/rst/directives.txt 2022-06-17 11:31:28 UTC (rev 9077)
@@ -884,7 +884,7 @@
The horizontal alignment of the table. (New in Docutils 0.13)
``delim`` : char | "tab" | "space" [#whitespace-delim]_
- A one-character string\ [#ASCII-char]_ used to separate fields.
+ A one-character string used to separate fields.
Defaults to ``,`` (comma). May be specified as a Unicode code
point; see the unicode_ directive for syntax details.
@@ -893,7 +893,7 @@
Defaults to the document's input_encoding_.
``escape`` : char
- A one-character\ [#ASCII-char]_ string used to escape the
+ A one-character string used to escape the
delimiter or quote characters. May be specified as a Unicode
code point; see the unicode_ directive for syntax details. Used
when the delimiter is used in an unquoted field, or when quote
@@ -920,7 +920,7 @@
significant. The default is to ignore such whitespace.
``quote`` : char
- A one-character string\ [#ASCII-char]_ used to quote elements
+ A one-character string used to quote elements
containing the delimiter or which start with the quote
character. Defaults to ``"`` (quote). May be specified as a
Unicode code point; see the unicode_ directive for syntax
@@ -950,13 +950,7 @@
.. [#whitespace-delim] Whitespace delimiters are supported only for external
CSV files.
-.. [#ASCII-char] With Python 2, the values for the ``delimiter``,
- ``quote``, and ``escape`` options must be ASCII characters. (The csv
- module does not support Unicode and all non-ASCII characters are
- encoded as multi-byte utf-8 string). This limitation does not exist
- under Python 3.
-
List Table
==========
Modified: trunk/docutils/docs/user/config.txt
===================================================================
--- trunk/docutils/docs/user/config.txt 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docs/user/config.txt 2022-06-17 11:31:28 UTC (rev 9077)
@@ -298,8 +298,9 @@
error_encoding_error_handler
----------------------------
-The error handler for unencodable characters in error output. See
-output_encoding_error_handler_ for acceptable values.
+The error handler for unencodable characters in error output.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module.
+See also output_encoding_error_handler_.
Default: "backslashreplace"
Options: ``--error-encoding-error-handler, --error-encoding, -e``.
@@ -373,8 +374,9 @@
input_encoding_error_handler
----------------------------
-The error handler for undecodable characters in the input. Acceptable
-values include:
+The error handler for undecodable characters in the input.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module,
+including:
strict
Raise an exception in case of an encoding error.
@@ -384,10 +386,6 @@
ignore
Ignore malformed data and continue without further notice.
-Acceptable values are the same as for the "error" parameter of
-Python's ``unicode`` function; other values may be defined in
-applications or in future versions of Python.
-
Default: "strict".
Options: ``--input-encoding-error-handler, --input-encoding, -i``.
@@ -421,13 +419,14 @@
The text encoding for output.
-Default: "UTF-8". Options: ``--output-encoding, -o``.
+Default: "utf-8". Options: ``--output-encoding, -o``.
output_encoding_error_handler
-----------------------------
-The error handler for unencodable characters in the output. Acceptable
-values include:
+The error handler for unencodable characters in the output.
+Acceptable values are the `Error Handlers`_ of Python's "encoding" module,
+including:
strict
Raise an exception in case of an encoding error.
@@ -442,10 +441,6 @@
backslashreplace
Replace with backslash escape sequences, such as "``\u2020``".
-Acceptable values are the same as for the "error" parameter of
-Python's ``encode`` string method; other values may be defined in
-applications or in future versions of Python.
-
Default: "strict".
Options: ``--output-encoding-error-handler, --output-encoding, -o``.
@@ -455,7 +450,7 @@
Path to a file where Docutils will write a list of files that were
required to generate the output, e.g. included files or embedded
stylesheets [#dependencies]_. [#pwd]_ The format is one path per
-line with forward slashes as separator, the encoding is ``utf8``.
+line with forward slashes as separator, the encoding is UTF-8.
Set to ``-`` in order to write dependencies to stdout.
@@ -465,10 +460,10 @@
ham.html: ham.txt $(shell cat hamdeps.txt)
rst2html.py --record-dependencies=hamdeps.txt ham.txt ham.html
-If the filesystem encoding differs from utf8, replace the ``cat``
+If the filesystem encoding differs from UTF-8, replace the ``cat``
command with a call to a converter, e.g.::
- $(shell iconv -f utf8 -t latin1 hamdeps.txt)
+ $(shell iconv -f utf-8 -t latin1 hamdeps.txt)
Default: None. Option: ``--record-dependencies``.
@@ -2336,3 +2331,6 @@
.. _option lists: ../ref/rst/restructuredtext.html#option-lists
.. _tables: ../ref/rst/restructuredtext.html#tables
.. _table of contents: ../ref/rst/directives.html#contents
+
+.. _Error Handlers:
+ https://docs.python.org/3/library/codecs.html#error-handlers
Modified: trunk/docutils/docutils/frontend.py
===================================================================
--- trunk/docutils/docutils/frontend.py 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/docutils/frontend.py 2022-06-17 11:31:28 UTC (rev 9077)
@@ -562,7 +562,7 @@
('Disable Python tracebacks. (default)',
['--no-traceback'], {'dest': 'traceback', 'action': 'store_false'}),
('Specify the encoding and optionally the '
- 'error handler of input text. Default: <locale-dependent>:strict.',
+ 'error handler of input text. Default: <auto-detect>:strict.',
['--input-encoding', '-i'],
{'metavar': '<name[:handler]>',
'validator': validate_encoding_and_error_handler}),
@@ -580,8 +580,8 @@
'"xmlcharrefreplace", "backslashreplace".',
['--output-encoding-error-handler'],
{'default': 'strict', 'validator': validate_encoding_error_handler}),
- ('Specify text encoding and error handler for error output. '
- 'Default: %s:%s.'
+ ('Specify text encoding and optionally error handler '
+ 'for error output. Default: %s:%s.'
% (default_error_encoding, default_error_encoding_error_handler),
['--error-encoding', '-e'],
{'metavar': '<name[:handler]>', 'default': default_error_encoding,
Modified: trunk/docutils/test/data/help/docutils.txt
===================================================================
--- trunk/docutils/test/data/help/docutils.txt 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/data/help/docutils.txt 2022-06-17 11:31:28 UTC (rev 9077)
@@ -55,7 +55,7 @@
--no-traceback Disable Python tracebacks. (default)
--input-encoding=<name[:handler]>, -i <name[:handler]>
Specify the encoding and optionally the error handler
- of input text. Default: <locale-dependent>:strict.
+ of input text. Default: <auto-detect>:strict.
--input-encoding-error-handler=INPUT_ENCODING_ERROR_HANDLER
Specify the error handler for undecodable characters.
Choices: "strict" (default), "ignore", and "replace".
@@ -67,8 +67,8 @@
characters; "strict" (default), "ignore", "replace",
"xmlcharrefreplace", "backslashreplace".
--error-encoding=<name[:handler]>, -e <name[:handler]>
- Specify text encoding and error handler for error
- output. Default: utf-8:backslashreplace.
+ Specify text encoding and optionally error handler for
+ error output. Default: utf-8:backslashreplace.
--error-encoding-error-handler=ERROR_ENCODING_ERROR_HANDLER
Specify the error handler for unencodable characters
in error output. Default: backslashreplace.
Modified: trunk/docutils/test/test_functional.py
===================================================================
--- trunk/docutils/test/test_functional.py 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/test_functional.py 2022-06-17 11:31:28 UTC (rev 9077)
@@ -157,7 +157,7 @@
no_expected = self.no_expected_template % {
'exp': expected_path, 'out': params['destination_path']}
self.assertTrue(os.access(expected_path, os.R_OK), no_expected)
- # samples are UTF8 encoded. 'rb' leads to errors with Python 3!
+ # samples are UTF-8 encoded. 'rb' leads to errors with Python 3!
f = open(expected_path, 'r', encoding='utf-8')
# Normalize line endings:
expected = '\n'.join(f.read().splitlines())
Modified: trunk/docutils/test/test_io.py
===================================================================
--- trunk/docutils/test/test_io.py 2022-06-17 11:31:17 UTC (rev 9076)
+++ trunk/docutils/test/test_io.py 2022-06-17 11:31:28 UTC (rev 9077)
@@ -130,8 +130,8 @@
self.assertEqual(data, ['Some include text.\n'])
def test_heuristics_no_utf8(self):
- # if no encoding is given and decoding with utf-8 fails,
- # use either the locale encoding (if specified) or latin-1:
+ # if no encoding is given and decoding with 'utf-8' fails,
+ # use either the locale encoding (if specified) or 'latin-1':
if io.locale_encoding not in ('utf-8', 'utf8'):
# in Py3k, the locale encoding is used without --input-encoding
# skipping the heuristic unless decoding fails.
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|