New Pre-Processor!

Carafe now includes a long-awaited Pre-Processor which takes care of tokenization and sentence detection. This is an early release of the pre-processor and is targeted now for Latin-1 chracater sets. A general Unicode tokenizer is planned for the future.

Posted by Ben Wellner 2006-10-20

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks