Re: [Classifier4j-devel] Bayesian Case Study - Lucene tokenizer
Status: Beta
Brought to you by:
nicklothian
|
From: moedusa <mo...@in...> - 2003-11-14 01:10:54
|
Concerning tokenization - is it possible somehow to reuse tokenization API and code from the Lucene (http://jakarta.apache.org/lucene)? It has html tokenizers, as well as stemmers, English, French, German, Russian and Chinese implementations, based on snowball (http://snowball.tartarus.org/) algorythm... But if I am not mistaken, it is based on JavaCC (tokenization, I mean). But stemming is not... |