From: Pekka K. <re...@bu...> - 2012-02-17 22:45:54
|
New submission from Pekka Klärck <pe...@ik...>: On my Linux machine with UTF-8 system encoding I got the following: $ a=ä python Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ['a'] '\xc3\xa4' >>> _.decode('UTF-8') u'\xe4' $ a=ä jython Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_21 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.environ['a'] '\xe4' I have seen Jython to return similarly wrong bytes earlier (e.g. #1592 and #1593) and know that I can decode them using this hack: >>> from java.lang import String >>> String(os.environ['a']).toString() u'\xe4' The problem is that if I set environment variables myself and encode them correctly, using the hack doesn't work: >>> os.environ['b'] = u'\xe4'.encode('UTF-8') >>> String(os.environ['b']).toString() u'\xc3\xa4' In other words I needed to know has the value been set before or during the execution. It turns out that I actually can do that using using java.lang.System.getenv which only knows about the former: >>> from java.lang.System import getenv >>> getenv('a') u'\xe4' >>> getenv('b') is None True Notice also how getenv above returned the correct value as Unicode. ---------- messages: 6782 nosy: pekka.klarck severity: normal status: open title: Non-ASCII environment variables are encoded incorrectly in os.environ _______________________________________ Jython tracker <re...@bu...> <http://bugs.jython.org/issue1841> _______________________________________ |