WordFreak / Discussion / Help: Training of Tokenizer in Wordfreak

Training of Tokenizer in Wordfreak

Forum: Help

Creator: kartinka

Created: 2005-11-01

Updated: 2013-04-09

kartinka - 2005-11-01

Hello!

I would like to train the Opennlp-Tokenizer in Wordfreak, but i have no idea how to realize it. The button (train files) isn't activated. Has somebody trained this Tokenizer?

Thank you for helps and ideas!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2005-11-03
  
  Training is not supported for the tokenizer within wordfreak. In general the train option is going to leave wordfreak as most training procedures take several hours and it doesn't make sense to do that within a UI.
  
  In order to train the tokenizer on wordfreak files which have their tokens annotated you need to run the main() of OpenNlpTokenAnnotator. The parameters are: model file1.txt .. filen.txt
  Each of the files need corresponding .ann files.
  
  Have you created files which are annotated for tokenization within WordFreak? In order to train the tokenizer you need a sufficiently large training set for the model to learn how to tokenize the data. There is not currently a mechanism to "add to" or improve the existing tokenization models that are distributed with OpenNlp. So you need a fair amount of data to actually create models that work better than those on your domain. Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.