Menu

Training of Tokenizer in Wordfreak

Help
kartinka
2005-11-01
2013-04-09
  • kartinka

    kartinka - 2005-11-01

    Hello!

    I would like to train the Opennlp-Tokenizer in Wordfreak, but i have no idea how to realize it. The button (train files) isn't activated. Has somebody trained this Tokenizer?

    Thank you for helps and ideas!

     
    • Thomas Morton

      Thomas Morton - 2005-11-03

      Training is not supported for the tokenizer within wordfreak.  In general the train option is going to leave wordfreak as most training procedures take several hours and it doesn't make sense to do that within a UI. 

      In order to train the tokenizer on wordfreak files which have their tokens annotated you need to run the main() of OpenNlpTokenAnnotator.  The parameters are: model file1.txt .. filen.txt
      Each of the files need corresponding .ann files.

      Have you created files which are annotated for tokenization within WordFreak?  In order to train the tokenizer you need a sufficiently large training set for the model to learn how to tokenize the data.  There is not currently a mechanism to "add to" or improve the existing tokenization models that are distributed with OpenNlp.  So you need a fair amount of data to actually create models that work better than those on your domain.  Hope this helps...Tom

       

Log in to post a comment.