>>> This could be another configuration option, with a default of
>> If stderr is ASCII-encoded, any character whose ord()> 127 will
>> cause a traceback.
> A stream is a sequence of bytes, and has no inherent encoding. A
> non-Unicode-String in Python is also a sequence of bytes. It is
> possible to write any character < 256 onto a stream, unless it's
> part of a Unicode string:
This is true, but beside the point (and unrelated to the text I
originally quoted). The point is that if we have a configuration
option for the stderr stream encoding, and that option is set to
'ASCII', and some ord(character) > 127, we will get a traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
But as you (seem to have) pointed out we can use "errors='ignore'":
>>> u'\u00fc'.encode('ascii', 'ignore')
Not very useful though, since potentially important characters
disappear. Better is 'replace':
>>> u'\u00fc'.encode('ascii', 'replace'))
But not by much. repr() isn't acceptable, because it does too much:
> PEP293 introduces encoding error callback functions into Python 2.3.
A little digging reveals the solution:
>>> u'\u00fc'.encode('ascii', 'backslashreplace')
This only works under Python 2.3 though. I think the best solution
would be to establish a runtime setting with a version-agnostic
default set at startup::
settings.error_callback = 'backslashreplace'
settings.error_callback = 'replace'
And in docutils.utils.Reporter.system_message use::
msgtext = unicode(msg.astext()).encode(
As for "--error-encoding", the default should be 'ASCII' as the lowest
common denominator. Reporter objects don't know about runtime
settings now; either the settings object or the
settings.error_encoding and settings.error_callback values will have
to be passed to the constructor. Or the "stream" object in each
ConditionSet could be wrapped by ``codecs.EncodedFile``. Or something
like that; I don't have the will right now to figure out what's
>>> I'll add another short demo, containing some kanji characters.
>> "Fireworks"! How does that work with your patch?
> It works fine. And with a UTF-8 xterm or Terminal, it is even
> readable in the error message. With a latin-1 terminal, it's still
> printed, but not readable.
"Printed, but not readable" is not very useful. "?" or "\u####" is
better than garbage.
> Another thought: If the error messages only quote text from the
> original file, it would be possible to default to the encoding used
> for the source file.
I don't think we can safely assume that input encoding and terminal
encoding are related. Better to be explicit.
David Goodger http://starship.python.net/~goodger
Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv