Thread: [Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

Brought to you by: bckfnn, bwarsaw, bzimmer, cgroves, and 4 others

jython-bugs

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-14 15:16:31

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-15 10:35:24

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-15 10:36:56

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-15 22:16:48

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-15 22:36:35

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-16 06:25:08

Bugs item #1659819, was opened at 2007-02-14 10:16
Message generated for change (Comment added) made by cgroves
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
>Group: targeted for 2.2beta2
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Charles Groves (cgroves)
Date: 2007-02-16 01:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I
couldn't find other methods implemented by Python string that could have
this problem but probably it's better that someone else also goes through
them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-16 10:25:54

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: targeted for 2.2beta2
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 12:25

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I need to set up Jython devenv before starting to play with this more but
I assume that's not too hard nowadays. I think these problems really need
some automated tests so I probably need to investigate how Jython test
system works too. If I got any problems I'll send queries to jython-dev.

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-16 08:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-17 13:49:13

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: targeted for 2.2beta2
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-17 15:48

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to fix join and the patch is below. Before really submitting it
I want to create tests for this first, try to fix also other affected
methods and get some comments about the patch.

The patch itself is not too complicated but I'm a bit worried about the
overhead in iterating over the given sequence twice -- first in
PyString.str_join and then again in this code. I made the check for
possible unicode items so that it short-circuits but the worst case of
iterating over the whole sequence when there is nothing unicode is
unfortunately also the common case. I'd say a better approach would be
determining the return type already in PyString.str_join but that requires
changes into so many places that it's better done by someone who
understands also the big picture behind this expose/derive system. Already
changing PyString.str_join to return PyString instead of String requires
few changes elsewhere in PyString.


Index: src/templates/str.expose
===================================================================
--- src/templates/str.expose    (revision 3110)
+++ src/templates/str.expose    (working copy)
@@ -40,12 +40,17 @@
 expose_meth: :b isupper
 expose_meth: join o
     String result = self.str_join(arg0);
-    //XXX: do we really need to check self?
-    if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
+    if (self instanceof PyUnicode) {
         return new PyUnicode(result);
-    } else {
-        return new PyString(result);
     }
+    PyObject iter = arg0.__iter__();
+    PyObject obj = null;
+    for (int i = 0; (obj = iter.__iternext__()) != null; i++) {
+        if (obj instanceof PyUnicode) {
+            return new PyUnicode(result);
+        }
+    }
+    return new PyString(result);
 expose_meth: :s ljust i
 expose_meth: :s lower
 expose_meth: :s lstrip S?


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 12:25

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I need to set up Jython devenv before starting to play with this more but
I assume that's not too hard nowadays. I think these problems really need
some automated tests so I probably need to investigate how Jython test
system works too. If I got any problems I'll send queries to jython-dev.

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-16 08:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-18 06:36:05

Bugs item #1659819, was opened at 2007-02-14 10:16
Message generated for change (Comment added) made by cgroves
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: targeted for 2.2beta2
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Charles Groves (cgroves)
Date: 2007-02-18 01:36

Message:
Logged In: YES 
user_id=1174327
Originator: NO

I think we can actually remove the self instanceof PyUnicode as well while
we're working on this; PyUnicode exposes a join method of its own that
always returns PyUnicode.

I only see a couple things you'd have to change to have str_join return
PyString: PyString.join would have to return str_join(seq).string and
PyUnicode.unicode_join would have to replace the result with an instance of
PyUnicode if the str_join didn't require it.  That's probably better
because I think it means we could remove the custom method implementation
code for join in str.expose in addition to getting rid of a second
iteration.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-17 08:48

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to fix join and the patch is below. Before really submitting it
I want to create tests for this first, try to fix also other affected
methods and get some comments about the patch.

The patch itself is not too complicated but I'm a bit worried about the
overhead in iterating over the given sequence twice -- first in
PyString.str_join and then again in this code. I made the check for
possible unicode items so that it short-circuits but the worst case of
iterating over the whole sequence when there is nothing unicode is
unfortunately also the common case. I'd say a better approach would be
determining the return type already in PyString.str_join but that requires
changes into so many places that it's better done by someone who
understands also the big picture behind this expose/derive system. Already
changing PyString.str_join to return PyString instead of String requires
few changes elsewhere in PyString.


Index: src/templates/str.expose
===================================================================
--- src/templates/str.expose    (revision 3110)
+++ src/templates/str.expose    (working copy)
@@ -40,12 +40,17 @@
 expose_meth: :b isupper
 expose_meth: join o
     String result = self.str_join(arg0);
-    //XXX: do we really need to check self?
-    if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
+    if (self instanceof PyUnicode) {
         return new PyUnicode(result);
-    } else {
-        return new PyString(result);
     }
+    PyObject iter = arg0.__iter__();
+    PyObject obj = null;
+    for (int i = 0; (obj = iter.__iternext__()) != null; i++) {
+        if (obj instanceof PyUnicode) {
+            return new PyUnicode(result);
+        }
+    }
+    return new PyString(result);
 expose_meth: :s ljust i
 expose_meth: :s lower
 expose_meth: :s lstrip S?


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 05:25

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I need to set up Jython devenv before starting to play with this more but
I assume that's not too hard nowadays. I think these problems really need
some automated tests so I probably need to investigate how Jython test
system works too. If I got any problems I'll send queries to jython-dev.

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-16 01:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-19 01:15:09

Bugs item #1659819, was opened at 2007-02-14 17:16
Message generated for change (Comment added) made by laukpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: targeted for 2.2beta2
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-19 03:15

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to get join and replace working correctly so that argument
checking is done only in the hand written part of the PyString. I'll attach
a patch so that you can check is it even close to be correct. Fixes are
still missing tests and also "'%s' % u'x'" issue is open.

File Added: patch.1

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-18 08:36

Message:
Logged In: YES 
user_id=1174327
Originator: NO

I think we can actually remove the self instanceof PyUnicode as well while
we're working on this; PyUnicode exposes a join method of its own that
always returns PyUnicode.

I only see a couple things you'd have to change to have str_join return
PyString: PyString.join would have to return str_join(seq).string and
PyUnicode.unicode_join would have to replace the result with an instance of
PyUnicode if the str_join didn't require it.  That's probably better
because I think it means we could remove the custom method implementation
code for join in str.expose in addition to getting rid of a second
iteration.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-17 15:48

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to fix join and the patch is below. Before really submitting it
I want to create tests for this first, try to fix also other affected
methods and get some comments about the patch.

The patch itself is not too complicated but I'm a bit worried about the
overhead in iterating over the given sequence twice -- first in
PyString.str_join and then again in this code. I made the check for
possible unicode items so that it short-circuits but the worst case of
iterating over the whole sequence when there is nothing unicode is
unfortunately also the common case. I'd say a better approach would be
determining the return type already in PyString.str_join but that requires
changes into so many places that it's better done by someone who
understands also the big picture behind this expose/derive system. Already
changing PyString.str_join to return PyString instead of String requires
few changes elsewhere in PyString.


Index: src/templates/str.expose
===================================================================
--- src/templates/str.expose    (revision 3110)
+++ src/templates/str.expose    (working copy)
@@ -40,12 +40,17 @@
 expose_meth: :b isupper
 expose_meth: join o
     String result = self.str_join(arg0);
-    //XXX: do we really need to check self?
-    if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
+    if (self instanceof PyUnicode) {
         return new PyUnicode(result);
-    } else {
-        return new PyString(result);
     }
+    PyObject iter = arg0.__iter__();
+    PyObject obj = null;
+    for (int i = 0; (obj = iter.__iternext__()) != null; i++) {
+        if (obj instanceof PyUnicode) {
+            return new PyUnicode(result);
+        }
+    }
+    return new PyString(result);
 expose_meth: :s ljust i
 expose_meth: :s lower
 expose_meth: :s lstrip S?


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 12:25

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I need to set up Jython devenv before starting to play with this more but
I assume that's not too hard nowadays. I think these problems really need
some automated tests so I probably need to investigate how Jython test
system works too. If I got any problems I'll send queries to jython-dev.

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-16 08:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 00:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 12:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

[Jython-bugs] [ jython-Bugs-1659819 ] Joining unicode items with string doesn't create unicode

From: SourceForge.net <no...@so...> - 2007-02-28 18:58:48

Bugs item #1659819, was opened at 2007-02-14 10:16
Message generated for change (Comment added) made by cgroves
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: targeted for 2.2beta2
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Pekka Laukkanen (laukpe)
Assigned to: Nobody/Anonymous (nobody)
Summary: Joining unicode items with string doesn't create unicode

Initial Comment:
When you join a list of unicode strings with a normal string (e.g. "x = ' '.join([u'a',u'b'])") the returned item is a normal string instead of a unicode string. If the string used in join is a unicode string then also the outcome is unicode. If you use the string returned in the former case as a unicode string later in your code (e.g. "unicode(x)") you get an UnicodeError if original unicode strings contained non-ascii characters. In CPython you get a unicode string in both cases as you would expect.

See examples using Jython 2.2b1 and Python 2.4.3 below. In Jython 2.2a1 things work differently due to http://jython.org/bugs/1538001 that's fixed in beta.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'str'>
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xE4 Good'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ul = [u'Hyv\u00E4',u'Good']
>>> 
>>> x = ' '.join(ul)
>>> type(x)
<type 'unicode'>
>>> unicode(x)
u'Hyv\xe4 Good'
>>> 
>>> y = u' '.join(ul)
>>> type(y)
<type 'unicode'>
>>> unicode(y)
u'Hyv\xe4 Good'
>>> 


----------------------------------------------------------------------

>Comment By: Charles Groves (cgroves)
Date: 2007-02-28 13:58

Message:
Logged In: YES 
user_id=1174327
Originator: NO

join and replace fix applied in r3127.  I opened a new bug for the format
issue in http://jython.org/bugs/1671134

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-18 20:15

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to get join and replace working correctly so that argument
checking is done only in the hand written part of the PyString. I'll attach
a patch so that you can check is it even close to be correct. Fixes are
still missing tests and also "'%s' % u'x'" issue is open.

File Added: patch.1

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-18 01:36

Message:
Logged In: YES 
user_id=1174327
Originator: NO

I think we can actually remove the self instanceof PyUnicode as well while
we're working on this; PyUnicode exposes a join method of its own that
always returns PyUnicode.

I only see a couple things you'd have to change to have str_join return
PyString: PyString.join would have to return str_join(seq).string and
PyUnicode.unicode_join would have to replace the result with an instance of
PyUnicode if the str_join didn't require it.  That's probably better
because I think it means we could remove the custom method implementation
code for join in str.expose in addition to getting rid of a second
iteration.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-17 08:48

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was able to fix join and the patch is below. Before really submitting it
I want to create tests for this first, try to fix also other affected
methods and get some comments about the patch.

The patch itself is not too complicated but I'm a bit worried about the
overhead in iterating over the given sequence twice -- first in
PyString.str_join and then again in this code. I made the check for
possible unicode items so that it short-circuits but the worst case of
iterating over the whole sequence when there is nothing unicode is
unfortunately also the common case. I'd say a better approach would be
determining the return type already in PyString.str_join but that requires
changes into so many places that it's better done by someone who
understands also the big picture behind this expose/derive system. Already
changing PyString.str_join to return PyString instead of String requires
few changes elsewhere in PyString.


Index: src/templates/str.expose
===================================================================
--- src/templates/str.expose    (revision 3110)
+++ src/templates/str.expose    (working copy)
@@ -40,12 +40,17 @@
 expose_meth: :b isupper
 expose_meth: join o
     String result = self.str_join(arg0);
-    //XXX: do we really need to check self?
-    if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
+    if (self instanceof PyUnicode) {
         return new PyUnicode(result);
-    } else {
-        return new PyString(result);
     }
+    PyObject iter = arg0.__iter__();
+    PyObject obj = null;
+    for (int i = 0; (obj = iter.__iternext__()) != null; i++) {
+        if (obj instanceof PyUnicode) {
+            return new PyUnicode(result);
+        }
+    }
+    return new PyString(result);
 expose_meth: :s ljust i
 expose_meth: :s lower
 expose_meth: :s lstrip S?


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-16 05:25

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I need to set up Jython devenv before starting to play with this more but
I assume that's not too hard nowadays. I think these problems really need
some automated tests so I probably need to investigate how Jython test
system works too. If I got any problems I'll send queries to jython-dev.

----------------------------------------------------------------------

Comment By: Charles Groves (cgroves)
Date: 2007-02-16 01:25

Message:
Logged In: YES 
user_id=1174327
Originator: NO

It's nice to see you digging into the code Pekka.  

It does sound like you've pinpointed the problem in join.  To test it out,
you can modify the indented section past expose_meth: join o to be the
fixed java code you've suggested.  Then run 'python gexpose.py str.expose
../org/python/core/PyString.java'  from the templates directory.  That'll
update PyString.java with your changes.  After that it's just a matter of
running a development copy of jython as described in
http://wiki.python.org/jython/JythonDeveloperGuide  I'll probably get to
this for beta2, but patches always speed things along.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

I was trying to find the cause for some of these problems (partly to see
how easily I can understand Jython internals) and found the following from
/trunk/jython/src/templates/str.expose (revision 3058). 

   41 expose_meth: join o
   42     String result = self.str_join(arg0);
   43     //XXX: do we really need to check self?
   44     if (self instanceof PyUnicode || arg0 instanceof PyUnicode) {
   45         return new PyUnicode(result);
   46     } else {
   47         return new PyString(result);
   48     }

I know next to nothing about these expose files (I've understood the
actual Java code is generated from them) but if I got it correctly "arg0
instanceof PyUnicode" checks is the argument given to the join method a
unicode object or not. If that's the case then the bug in join is exactly
here -- the argument given to it is a sequence and thus never unicode
itself. Instead the code should check is any of the items in the sequence
of unicode type. The XXX comment can also probably be removed because
u''.join([]) should create a unicode object too.

If str.expose is the right place to fix these issues then also replace
method ought to get similar checks as join has.

Fixing join and replace still leaves "'%s' % u'x'" issue open.

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 17:16

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Also replace seems to be affected. Testing with a post alpha snapshot in
this time because I don't have beta installed into this machine. I couldn't
find other methods implemented by Python string that could have this
problem but probably it's better that someone else also goes through them.


Jython 2.2a3005 on java1.5.0_09 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'str'>
>>> x
'a b'
>>>


Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'a b'.replace('b',u'b')
>>> type(x)
<type 'unicode'>
>>> x
u'a b'
>>> 


----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:36

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Catenating string and unicode seem to work correctly.

Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good ' + u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xE4'
>>> unicode(x)
u'Good Hyv\xE4'

----------------------------------------------------------------------

Comment By: Pekka Laukkanen (laukpe)
Date: 2007-02-15 05:35

Message:
Logged In: YES 
user_id=1379331
Originator: YES

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=112867&aid=1659819&group_id=12867