I remember one year ago implementing a PoS tagger using LISP (((( )))) ;o) while at Cam. CL...
... using unigrams the performance was horrible (about 20% if lucky), while you could get a 30%+ improvement by using N-grams > 1 (speed performance is another issue).
How is Grok dealing with these issues? What are Grok's performance rates?
jcgrij.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Grok uses a maximum entropy approach based on Ratnarparkhi's work (see ftp://ftp.cis.upenn.edu/pub/ircs/tr/98-15/98-15.ps.gz).
As I recall, performance on wall street journal text (the training domain for our default model) is >96%, and we find it does quite well on other domains. And if you want to tailor it to a specific domain and have training data for it, you can train a new model that the opennlp.grok.preprocess.postag.POSTaggerME class can use to good effect.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I remember one year ago implementing a PoS tagger using LISP (((( )))) ;o) while at Cam. CL...
... using unigrams the performance was horrible (about 20% if lucky), while you could get a 30%+ improvement by using N-grams > 1 (speed performance is another issue).
How is Grok dealing with these issues? What are Grok's performance rates?
jcgrij.
Grok uses a maximum entropy approach based on Ratnarparkhi's work (see ftp://ftp.cis.upenn.edu/pub/ircs/tr/98-15/98-15.ps.gz).
As I recall, performance on wall street journal text (the training domain for our default model) is >96%, and we find it does quite well on other domains. And if you want to tailor it to a specific domain and have training data for it, you can train a new model that the opennlp.grok.preprocess.postag.POSTaggerME class can use to good effect.