#876 Watch expression messes up encoding of unicode literals

1.4.4
open
debugger (210)
5
2009-03-16
2009-03-16
No

In the debugger, if I enter a watch expression like u'á', it evaluates to "unicode: ‡", which probably does not display correctly here; the character displayed is a capital Á in a box.

However, if I escape the character in the unicode literal: u'\xe1', it displays correctly: "unicode: á"

I see no problem with display of watch expression evaluations with unicode strings that originate elsewhere (or unicode strings specified with \x character escapes), the problem is only with unicode strings originating in non-ASCII literals in the watch expression.

Most likely explanation: the literal u'á' is not read correctly from the text field in the "Edit Watch Expression" dialog. The Eclipse text field is presumably Unicode data (it does display the character correctly when I type it in), but PyDev presumably reads its contents as a non-unicode string without specifying an encoding, so the platform default encoding is used, in my case MacRoman.

To confirm that, I try the watch expression ord(u'á'.encode('iso-8859-1')) and get int: 135. The character á is indeed at index 135 in MacRoman.

PyDev then has an non-unicode string (i.e. really just a byte array) consisting of the single byte 135, or 0x87. It includes that in the 8-bit string that it evaluates as a Python expression, so what Python sees is a unicode literal corresponding to u'\x87', a different character.

To confirm that, I try the watch expression u'\x87' and it evaluates to the same bogus string as for u'á'.

So PyDev needs to make sure it specifies the correct character encoding when reading the watch expression from unicode into an 8-bit string — or ideally just stay in unicode, if it's getting unicode from the text field and can have Python evaluate a unicode string as a Python expression.

(Uh, was that clear? :) )

Discussion