Czech language tokenizer and segmenter / News

Czech language tokenizer and segmenter / News: Recent posts

finished support for external definitions of rules, achieved PDT compatibility again.

Posted by 2008-04-17

I'm studying Unicode issues and possibilities to migrate from iswalpha etc. to Unicode Properties

Posted by 2008-03-27

latest release before migrating to external definition of non-trivial tokenizer and segmenter rules.

Posted by 2008-03-04

Achieved final compatibility with PDT tokenization. Fixed an end-of-doc bug.

Posted by 2008-02-21

including the comparison of the tokenizer with the PDT 2.0

Posted by 2008-02-13

trivial tokenizer is working and it is almost PDT-compliant

Posted by 2008-02-12