#5 TokenGenerator not correctly used in WordFeatures

open
nobody
None
5
2006-07-16
2006-07-16
No

WordFeatures class has a dictionary(WordsInTrain) which
uses a tokenGenerator to tokenize the input data for
building the dictionary.

But while firing features, this tokenization is not
done in the WordFeatures class. If the user is using
some custom tokenizer instead of Model.TokenGenerator,
then the WordFeatures will not get the correct string.

Example of possible scenario
Assume there is a tokenGenerator that converts the
special character in input string to a special token
"SPLCHAR".

In FeatureGenImpl.java
dict = new WordsInTrain(new
TokenGeneratorConvertSpecial());

While firing features, the tokens will not get
transformed and thus feature generation will not be
correct.

-amit

Discussion


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks