If I try to use attached script to analyse attached tokenised piece of news from hs.fi, the lookupping gets stuck in:
{u'POS': [u'ADVERB'], u'WORD_ID': [u'ennen']}
{u'CASE': [u'NOM', u'PAR'], u'GUESS': [u'COMPOUND'], u'ALLO': [u'IA'], u'POS': [u'NOUN', u'NOUN'], u'NUM': [u'SG', u'PL'], u'BOUNDARY': [u'COMPOUND'], u'WORD_ID': [u'kerta', u'er\xe4']}
{u'CASE': [u'NOM'], u'NUM': [u'SG'], u'SUBCAT': [u'CARD'], u'POS': [u'NUMERAL'], u'WORD_ID': [u'155']}
{u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'miljoona']}
{u'CASE': [u'ILL'], u'ALLO': [u'VN'], u'NUM': [u'SG'], u'POS': [u'NOUN'], u'WORD_ID': [u'euro']}
That is:
"ennen kertaeriä 155 miljoonaan euroon loka-joulukuussa."
Omorfi used is in googlecode git master with default settings.
HS tokenised, as of course sf.net does not have ability to upload two files at once :-\