From: Jeff A. <ja...@fa...> - 2017-10-26 21:31:34
|
Incidental to working on http://bugs.jython.org/issue2632, I noticed that mixed comparisons of unicode and str do not produce the same results in Jython as in CPython. CPython: >>> u = u"caf\xe9" >>> u == u.encode('latin-1') __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False Jython: >>> u = u"caf\xe9" >>> u == u.encode('latin-1') True CPython converts the str (or whatever is opposite on the ==) into a unicode, if it can. Jython just compares the internal Java string without reference to the default encoding. This is fairly minor when the default is ASCII but becomes quite significant when someone uses sys.setdefaultencoding('utf-8'), say, in site.py or with the reload(sys) trick. This trick is unreliable and I think we would not recommend it to anyone. Nevertheless, some people find it the only way to use Python 2 libraries that have not thoroughly provided for Unicode. Also, it makes a test I devised for the csv module work in CPython and fail in Jython. I got the impression you couldn't reload sys satisfactorily in Jython, but is seems to work. If someone does use this trick, do we intend to approximate CPython behaviour as closely as we can? Jeff -- Jeff Allen |