OpenNLP / Discussion / Open Discussion: Details on how models were trained

Jeff - 2008-11-08

Hi,

I am using the OpenNLP named entity recognizer for a project and I'd like to know how the downloadable model's were trained to include this information in the report.

In particular, I am using the english person, location, and organization recognizers with the current 1.4 distribution.

Are there any documents that describe these details? Training corpus, size, any specific parameters that may have been set?

Thanks!
Jeff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-11-11
  
  Hi,
  Sorry its taken a bit to get back to you. This isn't documented at present. I'm working on a white paper for researchers who use opennlp with this sort of info and something to cite, but its not ready yet.
  
  So each of the models is trained on about 2.4 millions words of data from Associated Press, Foreign Broadcast Information Service, Finical Times, LA Times, New York Times, San Jose Mercury News, and the Wall Street Journal.
  
  A small amount of data from all these sources has been entirely hand annotated (probably about 30k) and a larger amount of data from ap, nyt, and wsj has been automatically annotated, had some systematic errors involving quotes removed with a script, and some portion of the data hand corrected as well.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Jeff - 2008-11-12
  
  Thanks Tom, those details are great.
  
  --
  Jeff
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Details on how models were trained

Forums

Help

Details on how models were trained

Details on how models were trained

Forums

Help

Details on how models were trained document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Details on how models were trained