Menu

N-grams in PoS tagging?

2002-02-14
2002-02-14
  • JC Grijelmo-Menchon

    I remember one year ago implementing a PoS tagger using LISP (((( )))) ;o) while at Cam. CL...

    ... using unigrams the performance was horrible (about 20% if lucky), while you could get a 30%+ improvement by using N-grams > 1 (speed performance is another issue).

    How is Grok dealing with these issues? What are Grok's performance rates?

    jcgrij.

     
    • Jason Baldridge

      Jason Baldridge - 2002-02-14

      Grok uses a maximum entropy approach based on Ratnarparkhi's work (see ftp://ftp.cis.upenn.edu/pub/ircs/tr/98-15/98-15.ps.gz).

      As I recall, performance on wall street journal text (the training domain for our default model) is >96%, and we find it does quite well on other domains.  And if you want to tailor it to a specific domain and have training data for it, you can train a new model that the opennlp.grok.preprocess.postag.POSTaggerME class can use to good effect.

       

Log in to post a comment.