From: Baptiste L. <bap...@gm...> - 2009-10-07 20:13:30
|
Trying to get started with jython, I tried to run one of my script that count word frequencies in a document, but I've run in the error below: >c:\jython25\jython.bat purekeyworddbtest.py UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 2845-2851: illegal Unicode character I've interpreted this error message as "jython" somehow has an issue parsing purekeyworddbtest.py (not sure this is the right interpretation, but the lack of stack trace seems to hint in that direction...) So I fired up python to figure out what was in the file at this position? E:\prg\py\App\doc2keywords>python Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> text = open('purekeyworddbtest.py','rb').read() >>> print repr(text[2845:2851]) '\\u1fdc' >>> unichr(0x1fdc) u'\u1fdc' >>> print repr(text[2800:2900]) '\\u1fbf-\\u1fc1\\u1fc5\\u1fcd-\\u1fcf\\u1fd4-\\u1fd5\\u1fdc-\\u1fdf\\u1fed-\\u1ff1\\u1ff5\\u1ffd-\\u2070\\u2072-\\u2' Turns out that it is somehow stumbling on \u1fdc which is part of a regular expression character range. Any ideas why and how to work-around this? This script executes fine with regular python. I'm using Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54). |