Menu

Czech language tokenizer and segmenter / News: Recent posts

czechtok-1_0 released

finished support for external definitions of rules, achieved PDT compatibility again.

Posted by Drson 2008-04-17

Work temporarily frozen

I'm studying Unicode issues and possibilities to migrate from iswalpha etc. to Unicode Properties

Posted by Drson 2008-03-27

czechtok-0_4 releases

latest release before migrating to external definition of non-trivial tokenizer and segmenter rules.

Posted by Drson 2008-03-04

czechtok-0_3 released

Achieved final compatibility with PDT tokenization. Fixed an end-of-doc bug.

Posted by Drson 2008-02-21

czechtok-0_2 released

including the comparison of the tokenizer with the PDT 2.0

Posted by Drson 2008-02-13

czechtok-0_1 released

trivial tokenizer is working and it is almost PDT-compliant

Posted by Drson 2008-02-12