From: William D. <wi...@fl...> - 2003-04-28 21:14:04
|
Hi, I've the same problem when there are errors in a text with accent : .. =E9 :: File "/home/web/p/publy/www/admin/docutils/utils.py", line 170, in system= _message print >>stream, msg.astext() UnicodeError: ASCII encoding error: ordinal not in range(128) I didn't search solution now, i wonder that nobody had this problem before ? --=20 William Dode - http://flibuste.net |
From: David G. <go...@py...> - 2003-04-29 04:34:41
|
William Dode wrote: > I've the same problem when there are errors in a text with accent : > > .. é :: > > File "/home/web/p/publy/www/admin/docutils/utils.py", line 170, in > system_message > print >>stream, msg.astext() > UnicodeError: ASCII encoding error: ordinal not in range(128) > > I didn't search solution now, i wonder that nobody had this problem > before ? It has come up before, actually. On 2003-02-10, I replied to Adam Chodorowski: It's the diagnostic output that's causing the problem. There's a markup error on a section title that includes an accented character. A system message which includes a non-ASCII character is being sent to sys.stderr. Python believes your sys.stderr can't handle anything other than ASCII, and complains loudly. If your sys.stderr *can* handle Latin-1 output, telling Python about it may help. I'm not sure how to tell Python though -- perhaps via the default encoding in site.py? I'm not sure how to fix this, if it should be fixed, or even if anything is really broken. If I'm processing encoded text from an ASCII-only console, what kind of error output *should* I get? Should Docutils prevent non-ASCII output from being written to sys.stderr? Whose responsibility is it to set up the stderr environment for encoded output? We could write all error output with ``encodings.raw_unicode_escape.StreamWriter(sys.stderr).write()``, but that's kind of drastic. It means that all systems, even those that are encoding-aware, would get dumbed-down (and not easily deciphered) error output. Adam replied: The assumption should, IMHO, be that the user's terminal can handle the encoding that the file is in. I mean, the user most likely worked with the file (perhaps a textmode editor, or 'less', or 'cat', or...) in the terminal, so it seems a good bet. I would think this is true for 99% of the cases. In some other, eg. where you are processing a lot of documentation in different encodings on the same machine (eg. documentation for some software or whatever) the user is also probably more equipped to handle the problems. In any case, saying that "the error output will be in the same encoding as the input file" would simplify it a lot in this scenario too (you can make sane build scripts)... I'm not sure that I agree with this. Ideas and discussion are welcome. -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |
From: David G. <go...@py...> - 2003-04-29 04:40:56
|
I wrote: > Whose responsibility is it to set up the stderr > environment for encoded output? I just had a thought: perhaps Docutils should grow an --error-encoding option. Docutils could use it to tell Python how to encode sys.stderr. -- David Goodger |
From: William D. <wi...@fl...> - 2003-04-30 12:33:14
|
David Goodger <go...@py...> writes: > I wrote: > > Whose responsibility is it to set up the stderr > > environment for encoded output? >=20 > I just had a thought: perhaps Docutils should grow an --error-encoding > option. Docutils could use it to tell Python how to encode sys.stderr. maybe, what is strange is that i use an html output, and i can raise "=E9= =E0=E8" without problem... but why docutils change the encoding if the input is the same as the output ? --=20 William Dode - http://flibuste.net |
From: David G. <go...@py...> - 2003-04-30 14:28:20
|
William Dode wrote: > maybe, what is strange is that i use an html output, and i can raise "éàè" > without problem... HTML output uses UTF-8 encoding by default. It has no problem with non-ASCII characters (unless you set the output encoding to US-ASCII). > but why docutils change the encoding if the input is the same as the > output ? I don't understand the question. Do you understand that the error you saw was because the stderr stream couldn't handle non-ASCII characters (which I described in yesterday's post)? -- David Goodger |
From: William D. <wi...@fl...> - 2003-04-30 14:44:58
|
David Goodger <go...@py...> writes: > William Dode wrote: > > maybe, what is strange is that i use an html output, and i can raise "= =E9=E0=E8" > > without problem... >=20 > HTML output uses UTF-8 encoding by default. It has no problem with > non-ASCII characters (unless you set the output encoding to US-ASCII). >=20 > > but why docutils change the encoding if the input is the same as the > > output ? >=20 > I don't understand the question. Do you understand that the error you > saw was because the stderr stream couldn't handle non-ASCII characters > (which I described in yesterday's post)? but the stderr can handle non-ascii since i can raise "=E9=E0=E8"... the error is on the str(...) (but it doesn't change a lot) I understand that it's a problem of my configuration more than a bug in rst.=20 My question is why when i define the settings settings_overrides=3D{'input_encoding':'latin-1', 'output_encoding':'latin-1', 'language_code':'fr'}) i still have utf-8 string there... It seems that docutils convert my latin-1 to utf-8 to latin-1. --=20 William Dode - http://flibuste.net |
From: David G. <go...@py...> - 2003-04-30 21:12:33
|
William Dode wrote: > but the stderr can handle non-ascii since i can raise "éàè"... > the error is on the str(...) (but it doesn't change a lot) Are you using the latest CVS code (updated since 04-27)? If so, which str() is causing the error? Please provide a complete traceback and input file. > I understand that it's a problem of my configuration more than a > bug in rst. Not necessarily! > My question is why when i define the settings > settings_overrides={'input_encoding':'latin-1', > 'output_encoding':'latin-1', > 'language_code':'fr'}) > > i still have utf-8 string there... Where? > It seems that docutils convert my latin-1 to utf-8 to latin-1. Docutils decodes all input and uses Unicode strings internally. Unicode strings, not UTF-8-encoded 8-bit strings. -- David Goodger http://starship.python.net/~goodger Programmer/sysadmin for hire: http://starship.python.net/~goodger/cv |