On 2011-04-16, Stefan Merten wrote:
> 3 days ago Guenter Milde wrote:
>> On 2011-04-04, Stefan Merten wrote:
>> Actually, this is a Python bug. It should be fine with Python >=3D 2.6
> I don't think so - at least not the last part::
So this is more complex than I assumed.
> crashes. This is before your patch but with Python 2.6.5.
I suppose the problem are "implicit" unicode -> str conversions that
use the "ASCII, strict" encoding.
> I think the problem is not in the Python problem you mentioned but in
> the code at `docutils/docutils/statemachine.py:212`::
> print >>sys.stderr, (
> '\nStateMachine.run: input_lines (line_offset=3D%s):\n| %s'
> % (self.line_offset, '\n| '.join(self.input_lines)))
> IMHO the print statement causes the problem
But maybe it is also the conversion in the join()
> $ python
> Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)=20
> [GCC 4.4.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import sys
> >>> print >>sys.stderr, "%s" % ( '\n| '.join(u'\xe4'), )
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in posit=
> ion 0: ordinal not in range(128)
> >>> sys.stderr.encoding
> Yes, I still have LANG=3DC...
By design, this should be OK with the default
Specify the error handler for unencodable characters
in error output. Default: backslashreplace.
Can you test the following patch?
--- statemachine.py (Revision 7006)
+++ statemachine.py (Arbeitskopie)
@@ -209,7 +209,7 @@
print >>sys.stderr, (
'\nStateMachine.run: input_lines (line_offset=%s):\n| %s'
- % (self.line_offset, '\n| '.join(self.input_lines)))
+ % (self.line_offset, u'\n| '.join(self.input_lines)))
transitions = None
results = 
state = self.get_state()
> With all the logging facilities Docutils has I guess it would be
> feasible to use them instead of printing things simply out to
> `sys.stderr`. However, this idiom is quite common.
AFAIK, the printing to sys.stderr is usually done in addition to the
logging via the reporter class.
> Here are the hits
> for `sys.stderr` in some core sources::
> I guess all these places need to be fixed :-( .
I am not sure. At least in case there is a pure ASCII string, nothing
needs to be done. Things change with substitutions that include
parts of the document (which is generally an unicode object).
It would be wonderful, if you could prepare test cases for the Docutils