[Jython-users] Strange UnicodeDecodeError while parsing(?) python script to run

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Trying to get started with jython, I tried to run one of my script
that count word frequencies in a document, but I've run in the error
below:

>c:\jython25\jython.bat purekeyworddbtest.py
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
position 2845-2851: illegal Unicode character

I've interpreted this error message as "jython" somehow has an issue
parsing purekeyworddbtest.py (not sure this is the right
interpretation, but the lack of stack trace seems to hint in that
direction...)

So I fired up python to figure out what was in the file at this position?

E:\prg\py\App\doc2keywords>python
Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> text = open('purekeyworddbtest.py','rb').read()
>>> print repr(text[2845:2851])
'\\u1fdc'
>>> unichr(0x1fdc)
u'\u1fdc'
>>> print repr(text[2800:2900])
'\\u1fbf-\\u1fc1\\u1fc5\\u1fcd-\\u1fcf\\u1fd4-\\u1fd5\\u1fdc-\\u1fdf\\u1fed-\\u1ff1\\u1ff5\\u1ffd-\\u2070\\u2072-\\u2'

Turns out that it is somehow stumbling on \u1fdc which is part of a
regular expression character range. Any ideas why and how to
work-around this?

This script executes fine with regular python. I'm using Jython 2.5.1
(Release_2_5_1:6813, Sep 26 2009, 13:47:54).