From: SourceForge.net <no...@so...> - 2007-09-25 23:13:26
|
Bugs item #1802339, was opened at 2007-09-26 02:13 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-26 04:17:09
|
Bugs item #1802339, was opened at 2007-09-26 00:13 Message generated for change (Comment added) made by pjenvey You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Philip Jenvey (pjenvey) Date: 2007-09-26 05:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-26 07:34:26
|
Bugs item #1802339, was opened at 2007-09-26 02:13 Message generated for change (Comment added) made by laukpe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-26 10:34 Message: Logged In: YES user_id=1379331 Originator: YES This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. ---------------------------------------------------------------------- Comment By: Philip Jenvey (pjenvey) Date: 2007-09-26 07:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-27 04:42:29
|
Bugs item #1802339, was opened at 2007-09-25 18:13 Message generated for change (Comment added) made by cgroves You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Charles Groves (cgroves) Date: 2007-09-26 23:42 Message: Logged In: YES user_id=1174327 Originator: NO Without a patch in hand and a good understanding of the problem, I think this is too big of a change to attempt between release candidates. Even Philip's explanation below isn't complete because if CPython were just using unicode_escape on the printed objects, your final assert would fail. sys.stdout.getvalue() would have a str object in it which isn't equal to the unicode object from above. It definitely passes though. While 2.2.1 is too far along to fix this, I wouldn't mind making a 2.2.2 for this and whatever else comes up. That said, as long as you're not relying on unicode objects coming out of getvalue(which I don't think could be the case since that wouldn't have happened under 2.2 either), you might be able to get around this by setting the default encoding. The reason it's complaining about ascii is because ascii is the default default encoding. You can change that to any encoding supported by Jython in your site.py, and then whenever Jython attempts to turn a unicode object into a str without an explict encoding, it'll use that encoding to do the work. It works the same in the opposite direction when decoding a str into a unicode object without an explicit encoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-26 02:34 Message: Logged In: YES user_id=1379331 Originator: YES This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. ---------------------------------------------------------------------- Comment By: Philip Jenvey (pjenvey) Date: 2007-09-25 23:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-27 22:52:43
|
Bugs item #1802339, was opened at 2007-09-26 02:13 Message generated for change (Comment added) made by laukpe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-28 01:52 Message: Logged In: YES user_id=1379331 Originator: YES Philip pointed me to StdoutWrapper and after playing with it a little bit I was able to come up with a simple patch (attached) that makes the original example pass. I run dist/Lib/test/regrtest.py on 2.2 maint branch both w/ and w/o the patch and got same failures so it doesn't break everything. I have to confess that I don't really know the code in StdoutWrapper nor the code using it so I may very well be missing something totally obvious. The patch is rather ugly (catching Throwable is probably not the best idea) and should be taken as a prototype at this phase. File Added: unic.patch ---------------------------------------------------------------------- Comment By: Charles Groves (cgroves) Date: 2007-09-27 07:42 Message: Logged In: YES user_id=1174327 Originator: NO Without a patch in hand and a good understanding of the problem, I think this is too big of a change to attempt between release candidates. Even Philip's explanation below isn't complete because if CPython were just using unicode_escape on the printed objects, your final assert would fail. sys.stdout.getvalue() would have a str object in it which isn't equal to the unicode object from above. It definitely passes though. While 2.2.1 is too far along to fix this, I wouldn't mind making a 2.2.2 for this and whatever else comes up. That said, as long as you're not relying on unicode objects coming out of getvalue(which I don't think could be the case since that wouldn't have happened under 2.2 either), you might be able to get around this by setting the default encoding. The reason it's complaining about ascii is because ascii is the default default encoding. You can change that to any encoding supported by Jython in your site.py, and then whenever Jython attempts to turn a unicode object into a str without an explict encoding, it'll use that encoding to do the work. It works the same in the opposite direction when decoding a str into a unicode object without an explicit encoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-26 10:34 Message: Logged In: YES user_id=1379331 Originator: YES This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. ---------------------------------------------------------------------- Comment By: Philip Jenvey (pjenvey) Date: 2007-09-26 07:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-30 01:49:40
|
Bugs item #1802339, was opened at 2007-09-25 18:13 Message generated for change (Comment added) made by cgroves You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Charles Groves (cgroves) Date: 2007-09-29 20:49 Message: Logged In: YES user_id=1174327 Originator: NO I don't think this patch is going in the right direction. Rather than slipping in a quick fix for this particular case, we need to figure out exactly what CPython was doing in 2.2 and what CPython is doing currently. If the current behavior won't break 2.2's expectations in a horrible way, we can add it to our 2.2. Just shoehorning a fix in for this one case could lead to weirdly inconsistent behavior in different parts of the code, which I really want to avoid. Did you try setting the default encoding? You can do it from java with org.python.core.codecs.setDefaultEncoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-27 17:52 Message: Logged In: YES user_id=1379331 Originator: YES Philip pointed me to StdoutWrapper and after playing with it a little bit I was able to come up with a simple patch (attached) that makes the original example pass. I run dist/Lib/test/regrtest.py on 2.2 maint branch both w/ and w/o the patch and got same failures so it doesn't break everything. I have to confess that I don't really know the code in StdoutWrapper nor the code using it so I may very well be missing something totally obvious. The patch is rather ugly (catching Throwable is probably not the best idea) and should be taken as a prototype at this phase. File Added: unic.patch ---------------------------------------------------------------------- Comment By: Charles Groves (cgroves) Date: 2007-09-26 23:42 Message: Logged In: YES user_id=1174327 Originator: NO Without a patch in hand and a good understanding of the problem, I think this is too big of a change to attempt between release candidates. Even Philip's explanation below isn't complete because if CPython were just using unicode_escape on the printed objects, your final assert would fail. sys.stdout.getvalue() would have a str object in it which isn't equal to the unicode object from above. It definitely passes though. While 2.2.1 is too far along to fix this, I wouldn't mind making a 2.2.2 for this and whatever else comes up. That said, as long as you're not relying on unicode objects coming out of getvalue(which I don't think could be the case since that wouldn't have happened under 2.2 either), you might be able to get around this by setting the default encoding. The reason it's complaining about ascii is because ascii is the default default encoding. You can change that to any encoding supported by Jython in your site.py, and then whenever Jython attempts to turn a unicode object into a str without an explict encoding, it'll use that encoding to do the work. It works the same in the opposite direction when decoding a str into a unicode object without an explicit encoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-26 02:34 Message: Logged In: YES user_id=1379331 Originator: YES This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. ---------------------------------------------------------------------- Comment By: Philip Jenvey (pjenvey) Date: 2007-09-25 23:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |
From: SourceForge.net <no...@so...> - 2007-09-30 22:21:51
|
Bugs item #1802339, was opened at 2007-09-26 02:13 Message generated for change (Comment added) made by laukpe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: [221rc1] Problem printing unicode when stdout intercepted Initial Comment: Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). ---------------------------------------------------------------------- >Comment By: Pekka Laukkanen (laukpe) Date: 2007-10-01 01:21 Message: Logged In: YES user_id=1379331 Originator: YES I totally agree that fixing this issue with a hack that just seems to solve the problems is not the right thing to do. My patch was just an example showing that somehow modifying StdoutWrapper might be a part of the solution. Unfortunately I don't understand Jython (nor CPython) internals well enough to be able to figure out a real fix. =/ Thanks for mentioning org.python.core.codecs.setDefaultEncoding. I played with it a little and it seems that we could even have a workaround for the problem in our system. I changed my original example slightly and was able to get "print <unicode>" working. There are still some differences between different Jython versions and CPython but we should be able to handle them. Here's the new code: - - - - - - - - - - import sys import os from StringIO import StringIO if os.name == 'java': from org.python.core import codecs codecs.setDefaultEncoding('utf-8') print 'Jython', sys.version else: print 'Python', sys.version sys.stdout = StringIO() msg = u'Circle is 360\u00B0' print msg out = sys.stdout.getvalue() sys.stdout = sys.__stdout__ print out, type(out) print msg, type(msg) assert out == msg + '\n' - - - - - - - - - - And here are outputs using few different interpreters: - - - - - - - - - - Jython 2.2rc3 Circle is 360° <type 'str'> Circle is 360° <type 'unicode'> - - - - - - - - - - Jython 2.2.1rc1 Circle is 360° <type 'str'> Circle is 360° <type 'unicode'> Traceback (innermost last): File "unictest.py", line 21, in ? AssertionError: - - - - - - - - - - Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] Circle is 360° <type 'unicode'> Circle is 360° <type 'unicode'> ---------------------------------------------------------------------- Comment By: Charles Groves (cgroves) Date: 2007-09-30 04:49 Message: Logged In: YES user_id=1174327 Originator: NO I don't think this patch is going in the right direction. Rather than slipping in a quick fix for this particular case, we need to figure out exactly what CPython was doing in 2.2 and what CPython is doing currently. If the current behavior won't break 2.2's expectations in a horrible way, we can add it to our 2.2. Just shoehorning a fix in for this one case could lead to weirdly inconsistent behavior in different parts of the code, which I really want to avoid. Did you try setting the default encoding? You can do it from java with org.python.core.codecs.setDefaultEncoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-28 01:52 Message: Logged In: YES user_id=1379331 Originator: YES Philip pointed me to StdoutWrapper and after playing with it a little bit I was able to come up with a simple patch (attached) that makes the original example pass. I run dist/Lib/test/regrtest.py on 2.2 maint branch both w/ and w/o the patch and got same failures so it doesn't break everything. I have to confess that I don't really know the code in StdoutWrapper nor the code using it so I may very well be missing something totally obvious. The patch is rather ugly (catching Throwable is probably not the best idea) and should be taken as a prototype at this phase. File Added: unic.patch ---------------------------------------------------------------------- Comment By: Charles Groves (cgroves) Date: 2007-09-27 07:42 Message: Logged In: YES user_id=1174327 Originator: NO Without a patch in hand and a good understanding of the problem, I think this is too big of a change to attempt between release candidates. Even Philip's explanation below isn't complete because if CPython were just using unicode_escape on the printed objects, your final assert would fail. sys.stdout.getvalue() would have a str object in it which isn't equal to the unicode object from above. It definitely passes though. While 2.2.1 is too far along to fix this, I wouldn't mind making a 2.2.2 for this and whatever else comes up. That said, as long as you're not relying on unicode objects coming out of getvalue(which I don't think could be the case since that wouldn't have happened under 2.2 either), you might be able to get around this by setting the default encoding. The reason it's complaining about ascii is because ascii is the default default encoding. You can change that to any encoding supported by Jython in your site.py, and then whenever Jython attempts to turn a unicode object into a str without an explict encoding, it'll use that encoding to do the work. It works the same in the opposite direction when decoding a str into a unicode object without an explicit encoding. ---------------------------------------------------------------------- Comment By: Pekka Laukkanen (laukpe) Date: 2007-09-26 10:34 Message: Logged In: YES user_id=1379331 Originator: YES This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. ---------------------------------------------------------------------- Comment By: Philip Jenvey (pjenvey) Date: 2007-09-26 07:17 Message: Logged In: YES user_id=145787 Originator: NO This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1802339&group_id=12867 |