Menu

The parsing performance of OPENNLP

ldd
2008-05-18
2013-04-16
  • ldd

    ldd - 2008-05-18

    Hi:

    Thanks for your tool. I have used the opennlp tool to train wsj secti on02-21 and test on wsj section23. However, I found the performance I got is lower than that of Adwait Ratnaparkhi's paper. Maybe you have already done some experiment on PENN Treebank, so what your opinion about this? Thanks.

    Li Dengdeng

     
    • Thomas Morton

      Thomas Morton - 2008-05-19

      Hi,
         I've done these experiements and my recollection is that performance should be slightly higher than that reported in Adwait's thesis.  How much lower are we talking about?

         There is some pre-processing that is done by all the parsers I know of.  This involves creating a TOP node instead of the unlabeled node in the PTB and removing all unary productions of the form X->X.

      Hope this helps...Tom

       
    • ldd

      ldd - 2008-05-20

      Thanks for your tips. I used to removing all unary productions of the form X->X, but I didn't create a TOP node for each sentence. Maybe I will try it later.

      However, after parsing all the sentences, I usd the ./evalb script to evaulate the performance, and I found there are some errors (mismatch lengh) in it because in the POS tagging process, the word ' could be labeled as a punctuation, or POS tag.

      Thanks.

      Li Junhui

       
      • Thomas Morton

        Thomas Morton - 2008-05-20

        Hi,
          This is normal in cases where the tagger mistakes ' as POS vs '' vice versa.  The COLLINS.prm, which I think comes with evalb, allows for a certain number of error sentences.
        For sections 00 and 23 the parser will produce fewer error sentence than are allowed.

        Hope this helps...Tom

         
    • ldd

      ldd - 2008-05-21

      The following is the result I got. Thanks!
      -- All --
      Number of sentence        =   2416
      Number of Error sentence  =     15
      Number of Skip  sentence  =      0
      Number of Valid sentence  =   2401
      Bracketing Recall         =  83.27
      Bracketing Precision      =  84.19
      Bracketing FMeasure       =  83.73
      Complete match            =  25.86
      Average crossing          =   1.38
      No crossing               =  56.77
      2 or less crossing        =  79.93
      Tagging accuracy          =  92.44

      -- len<=40 --
      Number of sentence        =   2245
      Number of Error sentence  =     15
      Number of Skip  sentence  =      0
      Number of Valid sentence  =   2230
      Bracketing Recall         =  84.01
      Bracketing Precision      =  84.89
      Bracketing FMeasure       =  84.45
      Complete match            =  27.76
      Average crossing          =   1.17
      No crossing               =  59.82
      2 or less crossing        =  82.47
      Tagging accuracy          =  92.31

       
      • Thomas Morton

        Thomas Morton - 2008-05-21

        Hi,
           Something is odd.  I dug up my models from when I did these experiments (since the stuff I distribute is trained on some brown data as well and a larger set of data for pos data) and v1.3 of the code.  With those models and that code on section 23 I get the results at the bottom:

        Can you go through your training procedure?  I suspect something is wrong there.

        I have sections 2-21, 39832 sentences. 

        Here is my first sentence from section 02 after training pre-processing.

        (TOP  (S (PP-LOC (IN In)  (NP (NP (DT an)  (NNP Oct.)  (CD 19)  (NN review)  )(PP (IN of)  (NP (`` ``)  (NP-TTL (DT The)  (NN Misanthrope)  )('' '')  (PP-LOC (IN at)  (NP (NP (NNP Chicago)  (POS 's)  )(NNP Goodman)  (NNP Theatre)  ))))(PRN (-LRB- -LRB-) (`` ``)  (S-HLN (NP-SBJ (VBN Revitalized)  (NNS Classics)  )(VP (VBP Take)  (NP (DT the)  (NN Stage)  )(PP-LOC (IN in)  (NP (NNP Windy)  (NNP City)  ))))(, ,)  ('' '')  (NP-TMP (NN Leisure)  (CC &)  (NNS Arts)  )(-RRB- -RRB-)  )))(, ,)  (NP-SBJ-2 (NP (NP (DT the)  (NN role)  )(PP (IN of)  (NP (NNP Celimene)  )))(, ,)  (VP (VBN played)  (PP (IN by)  (NP-LGS (NNP Kim)  (NNP Cattrall)  )))(, ,)  )(VP (VBD was)  (VP (ADVP-MNR (RB mistakenly)  )(VBN attributed)  (PP-CLR (TO to)  (NP (NNP Christina)  (NNP Haag)  ))))(. .)  ))

        Empty elements are also removed: (any thing whose words are -NONE- and all the constituents above them).

        -- All --
        Number of sentence        =   2416
        Number of Error sentence  =      1
        Number of Skip  sentence  =      0
        Number of Valid sentence  =   2415
        Bracketing Recall         =  86.53
        Bracketing Precision      =  86.85
        Complete match            =  32.88
        Average crossing          =   1.24
        No crossing               =  61.16
        2 or less crossing        =  81.95
        Tagging accuracy          =  96.67

        -- len<=40 --
        Number of sentence        =   2245
        Number of Error sentence  =      1
        Number of Skip  sentence  =      0
        Number of Valid sentence  =   2244
        Bracketing Recall         =  87.11
        Bracketing Precision      =  87.53
        Complete match            =  35.16
        Average crossing          =   1.06
        No crossing               =  63.95
        2 or less crossing        =  84.27
        Tagging accuracy          =  96.63

        Thanks...Tom

         
    • ldd

      ldd - 2008-05-21

      Hi, Thanks for your reply and your results.

      There are also 39832 training sentences in my training file, and the first sentence is
      (TOP (S (PP (IN In)(NP (NP (DT an)(NNP Oct.)(CD 19)(NN review))(PP (IN of)(NP (`` ``)(NP (DT The)(NN Misanthrope))('' '')(PP (IN at)(NP (NP (NNP Chicago)(POS 's))(NNP Goodman)(NNP Theatre)))))(PRN (-LRB- -LRB-)(`` ``)(S (NP (VBN Revitalized)(NNS Classics))(VP (VBP Take)(NP (DT the)(NN Stage))(PP (IN in)(NP (NNP Windy)(NNP City)))))(, ,)('' '')(NP (NN Leisure)(CC &)(NNS Arts))(-RRB- -RRB-))))(, ,)(NP (NP (NP (DT the)(NN role))(PP (IN of)(NP (NNP Celimene))))(, ,)(VP (VBN played)(PP (IN by)(NP (NNP Kim)(NNP Cattrall))))(, ,))(VP (VBD was)(VP (ADVP (RB mistakenly))(VBN attributed)(PP (TO to)(NP (NNP Christina)(NNP Haag)))))(. .)) )
      While preparing the training sentences, I got rid of function tags, -none- nodes, etc.

      I used the following command to train the models:
      opennlp.tools.parser.ParserME -dict -tag -chunk -build -check E:\eclipse\OPENLP\PENN_Train.MRG E:\eclipse\OPENLP\English\parser 100 5
      where PENN_Train.MRG is the file with 39832 sentences.

      And I used this command to test the performance:
      opennlp.tools.lang.english.TreebankParser -d -bs 20 -ap 0.95 E:\eclipse\OPENLP\english\parser E:\eclipse\OPENLP\PENN_Test_Raw.txt
      where PENN_Test_Raw.txt is the file with test sentences. One sentence per line in the file.

      Meanwhile, I guess the low performance I got could be attributed to the low performance of POS. It might have something to do with the tagdict file. I didn't prepare the file myself and I just used the tagdict file downloaded from http://opennlp.sourceforge.net/models/english/parser/.

      Thanks!

       
    • ldd

      ldd - 2008-05-23

      or could you tell me the commands you used to get the 86.53 recall and 86.85 precision?

      Thanks.

       
      • Thomas Morton

        Thomas Morton - 2008-05-28

        Hi,
           Sorry its taken me a few days to respond.  I some how missed your previous post too.

           I have remembered though that if you don't have the TOP nodes on the evaluation file as well, you'll get poor recall results as EVALB will count the top empty node counts as missing:

        EVALB/evalb -p EVALB/COLLINS.prm 23.gold 23.opennlp-1.3

        With that you get:
        -- All --
        Number of sentence        =   2416
        Number of Error sentence  =      1
        Number of Skip  sentence  =      0
        Number of Valid sentence  =   2415
        Bracketing Recall         =  82.06
        Bracketing Precision      =  86.85
        Complete match            =   0.00
        Average crossing          =   1.24
        No crossing               =  61.16
        2 or less crossing        =  81.95
        Tagging accuracy          =  96.67

        This still doesn't explain your tagging accuracy issue.  I doubt it's the tag dictionary as it's actually a little better than what you'll get out of the treebank data by default.

        You commands look ok, although since you are using the default values you might try them without all the options. (I see for training there is a bug where you need to have at least three options)

        training:  java -mx1800M opennlp.tools.parser.ParserME wsj.train.ustrip models 100 5
        testing: java -mx500M opennlp.tools.lang.english.TreebankParser -d models < 23.tok > 23.opennlp-1.3

        I'll re-train this on that data and report back the training out in the next day or so (as it takes a while so I'll run it tonight).

        Hope this helps...Tom

         
    • Thomas Morton

      Thomas Morton - 2008-05-28

      Hi,
         I noticed that there is something wrong with the tagger training when done via the parser.  For the above post I kicked the job off to make sure I had the command right and saw that the tagger was converging slowly.  I've always trained the tagger separately in the past so I've never encountered this.

      Here's what I did to train the tagger:

      perl -ne 'while(/\(([^()]+)\)/g) { @parts=split(/ /,$1); print "$parts[1]_$parts[0] ";} print "\n";' wsj.train.ustrip | perl -ne 's/\s+$//; print "$_\n";' > wsj.train.tag

      java -mx500M opennlp.tools.postag.POSTaggerME wsj.train.tag tag.bin.gz

      I'll look for what the issue is with training the tagger via the parser.

      Thanks...Tom

       
    • Thomas Morton

      Thomas Morton - 2008-05-28

      Hi,
         You can fix the bug in tag training with the following change to opennlp.tool.parser.ParserEventStream.java

      68   else if (etype == EventTypeEnum.TAG) {
      69     this.tcg = new DefaultPOSContextGenerator(null);
      70   }

      Pass a null to DefaultPOSContextGenerator rather than the dictionary.

      Hope this helps...Tom

       

Log in to post a comment.