OpenNLP / Discussion / Open Discussion: The parsing performance of OPENNLP

ldd - 2008-05-18

Hi:

Thanks for your tool. I have used the opennlp tool to train wsj secti on02-21 and test on wsj section23. However, I found the performance I got is lower than that of Adwait Ratnaparkhi's paper. Maybe you have already done some experiment on PENN Treebank, so what your opinion about this? Thanks.

Li Dengdeng

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-05-19
  
  Hi,
  I've done these experiements and my recollection is that performance should be slightly higher than that reported in Adwait's thesis. How much lower are we talking about?
  
  There is some pre-processing that is done by all the parsers I know of. This involves creating a TOP node instead of the unlabeled node in the PTB and removing all unary productions of the form X->X.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ldd - 2008-05-20
  
  Thanks for your tips. I used to removing all unary productions of the form X->X, but I didn't create a TOP node for each sentence. Maybe I will try it later.
  
  However, after parsing all the sentences, I usd the ./evalb script to evaulate the performance, and I found there are some errors (mismatch lengh) in it because in the POS tagging process, the word ' could be labeled as a punctuation, or POS tag.
  
  Thanks.
  
  Li Junhui
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-05-20
    
    Hi,
    This is normal in cases where the tagger mistakes ' as POS vs '' vice versa. The COLLINS.prm, which I think comes with evalb, allows for a certain number of error sentences.
    For sections 00 and 23 the parser will produce fewer error sentence than are allowed.
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ldd - 2008-05-21
  
  The following is the result I got. Thanks!
  -- All --
  Number of sentence        =   2416
  Number of Error sentence =     15
  Number of Skip sentence =      0
  Number of Valid sentence =   2401
  Bracketing Recall         = 83.27
  Bracketing Precision      = 84.19
  Bracketing FMeasure       = 83.73
  Complete match            = 25.86
  Average crossing          =   1.38
  No crossing               = 56.77
  2 or less crossing        = 79.93
  Tagging accuracy          = 92.44
  
  -- len<=40 --
  Number of sentence        =   2245
  Number of Error sentence =     15
  Number of Skip sentence =      0
  Number of Valid sentence =   2230
  Bracketing Recall         = 84.01
  Bracketing Precision      = 84.89
  Bracketing FMeasure       = 84.45
  Complete match            = 27.76
  Average crossing          =   1.17
  No crossing               = 59.82
  2 or less crossing        = 82.47
  Tagging accuracy          = 92.31
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-05-21
    
    Hi,
       Something is odd. I dug up my models from when I did these experiments (since the stuff I distribute is trained on some brown data as well and a larger set of data for pos data) and v1.3 of the code. With those models and that code on section 23 I get the results at the bottom:
    
    Can you go through your training procedure? I suspect something is wrong there.
    
    I have sections 2-21, 39832 sentences.
    
    Here is my first sentence from section 02 after training pre-processing.
    
    (TOP (S (PP-LOC (IN In) (NP (NP (DT an) (NNP Oct.) (CD 19) (NN review) )(PP (IN of) (NP (`` ``) (NP-TTL (DT The) (NN Misanthrope) )('' '') (PP-LOC (IN at) (NP (NP (NNP Chicago) (POS 's) )(NNP Goodman) (NNP Theatre) ))))(PRN (-LRB- -LRB-) (`` ``) (S-HLN (NP-SBJ (VBN Revitalized) (NNS Classics) )(VP (VBP Take) (NP (DT the) (NN Stage) )(PP-LOC (IN in) (NP (NNP Windy) (NNP City) ))))(, ,) ('' '') (NP-TMP (NN Leisure) (CC &) (NNS Arts) )(-RRB- -RRB-) )))(, ,) (NP-SBJ-2 (NP (NP (DT the) (NN role) )(PP (IN of) (NP (NNP Celimene) )))(, ,) (VP (VBN played) (PP (IN by) (NP-LGS (NNP Kim) (NNP Cattrall) )))(, ,) )(VP (VBD was) (VP (ADVP-MNR (RB mistakenly) )(VBN attributed) (PP-CLR (TO to) (NP (NNP Christina) (NNP Haag) ))))(. .) ))
    
    Empty elements are also removed: (any thing whose words are -NONE- and all the constituents above them).
    
    -- All --
    Number of sentence        =   2416
    Number of Error sentence =      1
    Number of Skip sentence =      0
    Number of Valid sentence =   2415
    Bracketing Recall         = 86.53
    Bracketing Precision      = 86.85
    Complete match            = 32.88
    Average crossing          =   1.24
    No crossing               = 61.16
    2 or less crossing        = 81.95
    Tagging accuracy          = 96.67
    
    -- len<=40 --
    Number of sentence        =   2245
    Number of Error sentence =      1
    Number of Skip sentence =      0
    Number of Valid sentence =   2244
    Bracketing Recall         = 87.11
    Bracketing Precision      = 87.53
    Complete match            = 35.16
    Average crossing          =   1.06
    No crossing               = 63.95
    2 or less crossing        = 84.27
    Tagging accuracy          = 96.63
    
    Thanks...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ldd - 2008-05-21
  
  Hi, Thanks for your reply and your results.
  
  There are also 39832 training sentences in my training file, and the first sentence is
  (TOP (S (PP (IN In)(NP (NP (DT an)(NNP Oct.)(CD 19)(NN review))(PP (IN of)(NP (`` ``)(NP (DT The)(NN Misanthrope))('' '')(PP (IN at)(NP (NP (NNP Chicago)(POS 's))(NNP Goodman)(NNP Theatre)))))(PRN (-LRB- -LRB-)(`` ``)(S (NP (VBN Revitalized)(NNS Classics))(VP (VBP Take)(NP (DT the)(NN Stage))(PP (IN in)(NP (NNP Windy)(NNP City)))))(, ,)('' '')(NP (NN Leisure)(CC &)(NNS Arts))(-RRB- -RRB-))))(, ,)(NP (NP (NP (DT the)(NN role))(PP (IN of)(NP (NNP Celimene))))(, ,)(VP (VBN played)(PP (IN by)(NP (NNP Kim)(NNP Cattrall))))(, ,))(VP (VBD was)(VP (ADVP (RB mistakenly))(VBN attributed)(PP (TO to)(NP (NNP Christina)(NNP Haag)))))(. .)) )
  While preparing the training sentences, I got rid of function tags, -none- nodes, etc.
  
  I used the following command to train the models:
  opennlp.tools.parser.ParserME -dict -tag -chunk -build -check E:\eclipse\OPENLP\PENN_Train.MRG E:\eclipse\OPENLP\English\parser 100 5
  where PENN_Train.MRG is the file with 39832 sentences.
  
  And I used this command to test the performance:
  opennlp.tools.lang.english.TreebankParser -d -bs 20 -ap 0.95 E:\eclipse\OPENLP\english\parser E:\eclipse\OPENLP\PENN_Test_Raw.txt
  where PENN_Test_Raw.txt is the file with test sentences. One sentence per line in the file.
  
  Meanwhile, I guess the low performance I got could be attributed to the low performance of POS. It might have something to do with the tagdict file. I didn't prepare the file myself and I just used the tagdict file downloaded from http://opennlp.sourceforge.net/models/english/parser/.
  
  Thanks!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- ldd - 2008-05-23
  
  or could you tell me the commands you used to get the 86.53 recall and 86.85 precision?
  
  Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2008-05-28
    
    Hi,
       Sorry its taken me a few days to respond. I some how missed your previous post too.
    
       I have remembered though that if you don't have the TOP nodes on the evaluation file as well, you'll get poor recall results as EVALB will count the top empty node counts as missing:
    
    EVALB/evalb -p EVALB/COLLINS.prm 23.gold 23.opennlp-1.3
    
    With that you get:
    -- All --
    Number of sentence        =   2416
    Number of Error sentence =      1
    Number of Skip sentence =      0
    Number of Valid sentence =   2415
    Bracketing Recall         = 82.06
    Bracketing Precision      = 86.85
    Complete match            =   0.00
    Average crossing          =   1.24
    No crossing               = 61.16
    2 or less crossing        = 81.95
    Tagging accuracy          = 96.67
    
    This still doesn't explain your tagging accuracy issue. I doubt it's the tag dictionary as it's actually a little better than what you'll get out of the treebank data by default.
    
    You commands look ok, although since you are using the default values you might try them without all the options. (I see for training there is a bug where you need to have at least three options)
    
    training: java -mx1800M opennlp.tools.parser.ParserME wsj.train.ustrip models 100 5
    testing: java -mx500M opennlp.tools.lang.english.TreebankParser -d models < 23.tok > 23.opennlp-1.3
    
    I'll re-train this on that data and report back the training out in the next day or so (as it takes a while so I'll run it tonight).
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-05-28
  
  Hi,
  I noticed that there is something wrong with the tagger training when done via the parser. For the above post I kicked the job off to make sure I had the command right and saw that the tagger was converging slowly. I've always trained the tagger separately in the past so I've never encountered this.
  
  Here's what I did to train the tagger:
  
  perl -ne 'while(/$([^()]+)$/g) { @parts=split(/ /,$1); print "$parts[1]_$parts[0] ";} print "\n";' wsj.train.ustrip | perl -ne 's/\s+$//; print "$_\n";' > wsj.train.tag
  
  java -mx500M opennlp.tools.postag.POSTaggerME wsj.train.tag tag.bin.gz
  
  I'll look for what the issue is with training the tagger via the parser.
  
  Thanks...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2008-05-28
  
  Hi,
     You can fix the bug in tag training with the following change to opennlp.tool.parser.ParserEventStream.java
  
  68   else if (etype == EventTypeEnum.TAG) {
  69     this.tcg = new DefaultPOSContextGenerator(null);
  70   }
  
  Pass a null to DefaultPOSContextGenerator rather than the dictionary.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

The parsing performance of OPENNLP

Forums

Help

The parsing performance of OPENNLP

The parsing performance of OPENNLP

Forums

Help

The parsing performance of OPENNLP document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

The parsing performance of OPENNLP