Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#5 TokenGenerator not correctly used in WordFeatures

open
nobody
None
5
2006-07-16
2006-07-16
Amit Jaiswal
No

WordFeatures class has a dictionary(WordsInTrain) which
uses a tokenGenerator to tokenize the input data for
building the dictionary.

But while firing features, this tokenization is not
done in the WordFeatures class. If the user is using
some custom tokenizer instead of Model.TokenGenerator,
then the WordFeatures will not get the correct string.

Example of possible scenario
Assume there is a tokenGenerator that converts the
special character in input string to a special token
"SPLCHAR".

In FeatureGenImpl.java
dict = new WordsInTrain(new
TokenGeneratorConvertSpecial());

While firing features, the tokens will not get
transformed and thus feature generation will not be
correct.

-amit

Discussion