Menu

#386 Console should support UTF-8

closed-fixed
nobody
None
5
2008-04-07
2006-10-19
No

The PyDev console window should support UTF-8 output.

To find out whether it works, do

print u"Martin v. L\xF6wis"

in a script. This currently gives the error

print u"Martin v. L\xF6wis"
UnicodeEncodeError: 'ascii' codec can't encode
character u'\xf6' in position 11: ordinal not in range(128)

This, in turn, is due to sys.stdout.encoding being
None. It should be set to UTF-8 (IMO, or to the Eclipse
default encoding, whereever that comes from), and
decode all incoming bytes from that encoding. Likewise
for sys.stdin.encoding.

Discussion

  • Fabio Zadrozny

    Fabio Zadrozny - 2006-10-23

    Logged In: YES
    user_id=617340

    Humm... strange... if you do:

    print u"Martin v. L\xF6wis".encode('cp1252')

    or if you do directly:

    print "Martin v. Löwis"

    it works ok (I believe that the encoding of the buffer is
    the default for the platform)... this seems more like a
    python issue to me than an Eclipse issue (don't you think so?)

    Cheers,

    Fabio

     
  • Fabio Zadrozny

    Fabio Zadrozny - 2006-10-23
    • status: open --> pending-works-for-me
     
  • Martin v. Löwis

    • status: pending-works-for-me --> open-works-for-me
     
  • Martin v. Löwis

    Logged In: YES
    user_id=21627

    Please try this in a console/terminal window, on Unix or
    Linux, or in IDLE.

    Python prints Unicode strings by looking a
    sys.stdout.encoding, and
    the encoding that is there is then used to encode the
    Unicode string.
    Printing a byte string just literally transmits it to the
    terminal,
    so it's no suprise that this "works".

    The precise procedure that Python uses depends on the
    operating system.
    On Unix, Python checks whether stdout is a terminal (through
    isatty);
    if it is, it then uses the locale's charset (by invoking
    nl_langinfo(CHARSET))
    to find out the terminal's encoding. On Windows, it uses
    GetConsoleOutputCP
    to determine the encoding of the console window. In IDLE,
    IDLE replaces
    sys.stdout with something else (so output ends up in IDLE's
    shell window),
    and arranges to set the encoding on this "something else"
    explicitly.

    I'm not sure which of these strategies should work best for
    PyDev. However,
    it's clearly not Python's issue *alone* to figure out the
    encoding of
    sys.stdout when running in PyDev: Python would need some
    mechanism to
    find out that it is indeed running in PyDev, or PyDev should
    arrange
    to setup sys.stdout.encoding explicitly.

    In any case, I can't follow your "Works for Me"
    interpretation: I
    very much doubt that the original example I've given
    actually works
    for you.

     
  • Fabio Zadrozny

    Fabio Zadrozny - 2006-10-28
    • status: open-works-for-me --> open
     
  • Fabio Zadrozny

    Fabio Zadrozny - 2006-10-28

    Logged In: YES
    user_id=617340

    Hummm... yeah, the problem is that the console in Eclipse is
    not actually a "real" console... I'll have to take a better
    look at the Eclipse API to see if it actually has some way
    of setting it... (I've already taken a quick look without
    any success).

     
  • Martin v. Löwis

    Logged In: YES
    user_id=21627

    I see... it seems Java doesn't support creating processes in
    a pseudo-terminal at all.

    In that case, I think it would be possible to manually set
    the encoding of sys.stdout, through PYTHONSTARTUP. The
    startup code could be generated on the fly, to match the
    console's encoding (which is given through the
    DebugPlugin.ATTR_CONSOLE_ENCODING configuration AFAICT). It
    would have to create a wrapper for sys.std{in|out|err},
    since their encoding attribute is read-only, and any
    original PYTHONSTARTUP file would need to be execfile'd.

     
  • Fabio Zadrozny

    Fabio Zadrozny - 2007-01-16

    Logged In: YES
    user_id=617340
    Originator: NO

    Actually, the PYTHONSTARTUP appears to work only in the interactive console, and I'm not really sure this is the best option... isn't there any way to pass this to interpreter (like python -u)?
    -- It would be much better than making this kind of workaround, or python trying to discover which encoding it should use (as sys.stdout.encoding is readonly, there should be an option to set it... or not?)

     
  • Fabio Zadrozny

    Fabio Zadrozny - 2007-06-19

    Logged In: YES
    user_id=617340
    Originator: NO

    Changing to bug...

     
  • Fabio Zadrozny

    Fabio Zadrozny - 2008-04-07
    • status: open --> closed-fixed
     
  • Fabio Zadrozny

    Fabio Zadrozny - 2008-04-07

    Logged In: YES
    user_id=617340
    Originator: NO

    Fixed for 1.3.15

    The final solution was using creating a 'sitecustomize.py' which is always added to the pythonpath as the 1st path (and then removed to execute a 'sitecustomize.py' that may be defined by the user).

    At this module, the 'sys.setdefaultencoding' can be used, as it's imported just before that method is deleted.
    It can be seen at: http://pydev.cvs.sourceforge.net/pydev/org.python.pydev/PySrc/pydev_sitecustomize/