On 2011-12-06, Martin Bless wrote:
> [Guenter Milde] wrote & schrieb:
>>-> decoding the command line (sys.argv) shall use
>> docutils.frontend.locale_encoding and not sys.stdin.encoding!
> [...]
>>True. I will remove the `encoding=...` option. Does this solve problem?
>>(If not, please post the output of just running test_dependencies.py.)
> There is one error left in test_dependencies.py.
> And one in test_heuristics_latin1.py
>==============================
> Docutils version used:
>==============================
> Revision: 7247
> Author: milde
> Date: Montag, 5. Dezember 2011 22:20:43
> Message: argv_encoding != sys.stdin.encoding
> Thanks to Martin Bless for the report and tests.
> ----
> Modified : /trunk/docutils/docutils/core.py
> Modified : /trunk/docutils/test/test_command_line.py
>==============================
> Test: alltests.py, Win 7 64bit
>==============================
> E:\kannweg>python alltests.py
> Testing Docutils 0.9 [repository] with Python 2.7.2 on 2011-12-06 at
> 09:47:04
> [...]
> ........................................................
>======================================================================
> FAIL: test_dependencies (test_dependencies.RecordDependenciesTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "D:\Repositories\docutils\test\test_dependencies.py", line 52,
> in test_dependencies
> u'\u043a\u0430\u0440\u0442\u0438\u043d\u0430.jpg'])
> AssertionError: Lists differ: [u'data/include.txt', u'data/r... !=
> ['data/include.txt', 'data/raw...
> First differing element 3:
> ???????.jpg
> \u043a\u0430\u0440\u0442\u0438\u043d\u0430.jpg
> - [u'data/include.txt', u'data/raw.txt', u'some_image.png',
> u'???????.jpg']
> + ['data/include.txt',
> + 'data/raw.txt',
> + 'some_image.png',
> + u'\u043a\u0430\u0440\u0442\u0438\u043d\u0430.jpg']
I forgot an important point: The record file is generated during the
Doctutils run and stores the names of files required to generate the
output document. (See
http://docutils.sourceforge.net/docs/user/config.html#record-dependencies
) This is why its content is encoded in the file system encoding.
In order to prevent encoding errors while writing this file but still
have some meaningful and reversible content, the encoding-error-handler
is set to xmlcharrefreplace in the DependencyList class in utils.py. The
test case needs to read the file with these settings and compare the
content to a list of unicode strings.
Does it work with the following patch?
--- test_dependencies.py (Revision 7246)
+++ test_dependencies.py (Arbeitskopie)
@@ -35,8 +35,10 @@
docutils.core.publish_file(
destination=DocutilsTestSupport.DevNull(), **settings)
settings['settings_overrides']['record_dependencies'].close()
+ # The record file contains filenames in the local file system encoding
record = docutils.io.FileInput(source_path=recordfile,
- encoding='utf8')
+ encoding=sys.getfilesystemencoding(),
+ error_handler='xmlcharrefreplace')
return record.read().splitlines()
def test_dependencies(self):
>======================================================================
> FAIL: test_heuristics_latin1 (test_io.InputTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "D:\Repositories\docutils\test\test_io.py", line 81, in
> test_heuristics_latin1
> self.assertEqual(input.successful_encoding, 'latin-1')
> AssertionError: 'cp1252' != 'latin-1'
This is a false positive. The heuristics should try the locale encoding
(if it is specified) before the "last ressort" latin-1.
Fixet in Revision 7248.
Thanks
Günter
|