From: SourceForge.net <no...@so...> - 2007-12-08 22:22:56
|
Bugs item #1840479, was opened at 2007-11-28 13:47 Message generated for change (Comment added) made by cgroves You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1840479&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jörg Höhle (hoehle) Assigned to: Nobody/Anonymous (nobody) Summary: coding: utf-8 and PEP 0263? Initial Comment: Hi, My understanding of PEP0263 is that the "coding: utf-8" in the first line should influence the reading of .py files. Alas, the PEP says: Python-Version: 2.3 whereas jython-2.2 is documented as corresponding to Python 2.2. http://www.python.org/dev/peps/pep-0263/ So possibly mine is not a bug, but a feature request. How can I use UTF-8 umlauts in my .py files with Jython? # foo.py -*- coding: utf-8 -*- http://www.python.org/peps/pep-0263.html inlineds = "zäöü!" inlinedu = u"zäöü!" explicits= "z\u00e4\u00f6\u00fc!" explicitu= u"z\u00e4\u00f6\u00fc!" all4=[inlineds,inlinedu,explicits,explicitu] print all4, [len(s) for s in all4] On a RedHat 5 system this produces: ['z\xC3\xA4\xC3\xB6\xC3\xBC!', u'z\xC3\xA4\xC3\xB6\xC3\xBC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] Jython 2.2 on java1.6.0_05-ea uname -a Linux foo.xy 2.6.9-55.0.9.ELsmp #1 SMP Tue Sep 25 02:16:15 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux LANG=de_DE@UTF-8 Debian produces expected results: ['z\xE4\xF6\xFC!', u'z\xE4\xF6\xFC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [5,5,20,5] Jython 2.2 on java1.6.0_02 uname -a Linux debianbasic 2.6.18-5-686 #1 ... i686 GNU/Linux LANG=de_DE.UTF-8 However, even on the Debian system changing $LANG gives LANG=C ./jython.sh foo.py [u'z\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD!', u'z\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] All happens as if Jython reads the .py file using Java's default encoding (which is influenced by $LANG but cannot directly be set AFAIK). java.nio.charset.Charset.defaultCharset() java.io.OutputStreamWriter(java.io.ByteArrayOutputStream()).getEncoding() yields Java's default encoding. I've now installed 2.2.1 and results change, although still not satisfactorily. The Debian system now always yields: ['z\xC3\xA4\xC3\xB6\xC3\xBC!', u'z\xC3\xA4\xC3\xB6\xC3\xBC!', 'z\\u00e4\\u00f6\\u00fc!', u'z\xE4\xF6\xFC!'] [8, 8, 20, 5] like Redhat before, regardless of $LANG. Thus jython-2.2.1 seems to strictly assume ISO-8859-1 in .py files. At least 2.2.1 behaviour is consistent between the two Redhat and Debian systems I tested. Regards, Jörg Höhle ---------------------------------------------------------------------- >Comment By: Charles Groves (cgroves) Date: 2007-12-08 17:22 Message: Logged In: YES user_id=1174327 Originator: NO Yes, this is just a missing feature. One of the major changes for 2.2.1 was to no longer use Charset.defaultCharset: it introduces unpredictable behavior between platforms as you saw. PEP 263 will definitely appear in the next major version of Jython. For now you're stuck using explicit unicode escapes to get umlauts in .py files. ---------------------------------------------------------------------- Comment By: Otmar Humbel (otmarhumbel) Date: 2007-11-28 16:08 Message: Logged In: YES user_id=105844 Originator: NO I am pretty sure it is a missing feature, since I've been missing it too. Standalone mode should not make any difference here. ---------------------------------------------------------------------- Comment By: Jörg Höhle (hoehle) Date: 2007-11-28 13:51 Message: Logged In: YES user_id=377168 Originator: YES I should mention that I'm using standalone-mode (for ease of use for my Java colleagues). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1840479&group_id=12867 |