From: SourceForge.net <no...@so...> - 2007-11-28 21:08:58
|
Bugs item #1840479, was opened at 2007-11-28 19:47 Message generated for change (Comment added) made by otmarhumbel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1840479&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jörg Höhle (hoehle) Assigned to: Nobody/Anonymous (nobody) Summary: coding: utf-8 and PEP 0263? Initial Comment: Hi, My understanding of PEP0263 is that the "coding: utf-8" in the first line should influence the reading of .py files. Alas, the PEP says: Python-Version: 2.3 whereas jython-2.2 is documented as corresponding to Python 2.2. http://www.python.org/dev/peps/pep-0263/ So possibly mine is not a bug, but a feature request. How can I use UTF-8 umlauts in my .py files with Jython? # foo.py -*- coding: utf-8 -*- http://www.python.org/peps/pep-0263.html inlineds = "zäöü!" inlinedu = u"zäöü!" explicits= "z\u00e4\u00f6\u00fc!" explicitu= u"z\u00e4\u00f6\u00fc!" all4=[inlineds,inlinedu,explicits,explicitu] print all4, [len(s) for s in all4] On a RedHat 5 system this produces: ['z\xC3\xA4\xC3\xB6\xC3\xBC!', u'z\xC3\xA4\xC3\xB6\xC3\xBC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] Jython 2.2 on java1.6.0_05-ea uname -a Linux foo.xy 2.6.9-55.0.9.ELsmp #1 SMP Tue Sep 25 02:16:15 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux LANG=de_DE@UTF-8 Debian produces expected results: ['z\xE4\xF6\xFC!', u'z\xE4\xF6\xFC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [5,5,20,5] Jython 2.2 on java1.6.0_02 uname -a Linux debianbasic 2.6.18-5-686 #1 ... i686 GNU/Linux LANG=de_DE.UTF-8 However, even on the Debian system changing $LANG gives LANG=C ./jython.sh foo.py [u'z\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD!', u'z\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] All happens as if Jython reads the .py file using Java's default encoding (which is influenced by $LANG but cannot directly be set AFAIK). java.nio.charset.Charset.defaultCharset() java.io.OutputStreamWriter(java.io.ByteArrayOutputStream()).getEncoding() yields Java's default encoding. I've now installed 2.2.1 and results change, although still not satisfactorily. The Debian system now always yields: ['z\xC3\xA4\xC3\xB6\xC3\xBC!', u'z\xC3\xA4\xC3\xB6\xC3\xBC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] like Redhat before, regardless of $LANG. Thus jython-2.2.1 seems to strictly assume ISO-8859-1 in .py files. At least 2.2.1 behaviour is consistent between the two Redhat and Debian systems I tested. Regards, Jörg Höhle ---------------------------------------------------------------------- >Comment By: Otmar Humbel (otmarhumbel) Date: 2007-11-28 22:08 Message: Logged In: YES user_id=105844 Originator: NO I am pretty sure it is a missing feature, since I've been missing it too. Standalone mode should not make any difference here. ---------------------------------------------------------------------- Comment By: Jörg Höhle (hoehle) Date: 2007-11-28 19:51 Message: Logged In: YES user_id=377168 Originator: YES I should mention that I'm using standalone-mode (for ease of use for my Java colleagues). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1840479&group_id=12867 |