Thanks for your tool. I have used the opennlp tool to train wsj secti on02-21 and test on wsj section23. However, I found the performance I got is lower than that of Adwait Ratnaparkhi's paper. Maybe you have already done some experiment on PENN Treebank, so what your opinion about this? Thanks.
Li Dengdeng
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I've done these experiements and my recollection is that performance should be slightly higher than that reported in Adwait's thesis. How much lower are we talking about?
There is some pre-processing that is done by all the parsers I know of. This involves creating a TOP node instead of the unlabeled node in the PTB and removing all unary productions of the form X->X.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your tips. I used to removing all unary productions of the form X->X, but I didn't create a TOP node for each sentence. Maybe I will try it later.
However, after parsing all the sentences, I usd the ./evalb script to evaulate the performance, and I found there are some errors (mismatch lengh) in it because in the POS tagging process, the word ' could be labeled as a punctuation, or POS tag.
Thanks.
Li Junhui
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
This is normal in cases where the tagger mistakes ' as POS vs '' vice versa. The COLLINS.prm, which I think comes with evalb, allows for a certain number of error sentences.
For sections 00 and 23 the parser will produce fewer error sentence than are allowed.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The following is the result I got. Thanks!
-- All --
Number of sentence = 2416
Number of Error sentence = 15
Number of Skip sentence = 0
Number of Valid sentence = 2401
Bracketing Recall = 83.27
Bracketing Precision = 84.19
Bracketing FMeasure = 83.73
Complete match = 25.86
Average crossing = 1.38
No crossing = 56.77
2 or less crossing = 79.93
Tagging accuracy = 92.44
-- len<=40 --
Number of sentence = 2245
Number of Error sentence = 15
Number of Skip sentence = 0
Number of Valid sentence = 2230
Bracketing Recall = 84.01
Bracketing Precision = 84.89
Bracketing FMeasure = 84.45
Complete match = 27.76
Average crossing = 1.17
No crossing = 59.82
2 or less crossing = 82.47
Tagging accuracy = 92.31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Something is odd. I dug up my models from when I did these experiments (since the stuff I distribute is trained on some brown data as well and a larger set of data for pos data) and v1.3 of the code. With those models and that code on section 23 I get the results at the bottom:
Can you go through your training procedure? I suspect something is wrong there.
I have sections 2-21, 39832 sentences.
Here is my first sentence from section 02 after training pre-processing.
Empty elements are also removed: (any thing whose words are -NONE- and all the constituents above them).
-- All --
Number of sentence = 2416
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2415
Bracketing Recall = 86.53
Bracketing Precision = 86.85
Complete match = 32.88
Average crossing = 1.24
No crossing = 61.16
2 or less crossing = 81.95
Tagging accuracy = 96.67
-- len<=40 --
Number of sentence = 2245
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2244
Bracketing Recall = 87.11
Bracketing Precision = 87.53
Complete match = 35.16
Average crossing = 1.06
No crossing = 63.95
2 or less crossing = 84.27
Tagging accuracy = 96.63
Thanks...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There are also 39832 training sentences in my training file, and the first sentence is
(TOP (S (PP (IN In)(NP (NP (DT an)(NNP Oct.)(CD 19)(NN review))(PP (IN of)(NP (`` ``)(NP (DT The)(NN Misanthrope))('' '')(PP (IN at)(NP (NP (NNP Chicago)(POS 's))(NNP Goodman)(NNP Theatre)))))(PRN (-LRB- -LRB-)(`` ``)(S (NP (VBN Revitalized)(NNS Classics))(VP (VBP Take)(NP (DT the)(NN Stage))(PP (IN in)(NP (NNP Windy)(NNP City)))))(, ,)('' '')(NP (NN Leisure)(CC &)(NNS Arts))(-RRB- -RRB-))))(, ,)(NP (NP (NP (DT the)(NN role))(PP (IN of)(NP (NNP Celimene))))(, ,)(VP (VBN played)(PP (IN by)(NP (NNP Kim)(NNP Cattrall))))(, ,))(VP (VBD was)(VP (ADVP (RB mistakenly))(VBN attributed)(PP (TO to)(NP (NNP Christina)(NNP Haag)))))(. .)) )
While preparing the training sentences, I got rid of function tags, -none- nodes, etc.
I used the following command to train the models:
opennlp.tools.parser.ParserME -dict -tag -chunk -build -check E:\eclipse\OPENLP\PENN_Train.MRG E:\eclipse\OPENLP\English\parser 100 5
where PENN_Train.MRG is the file with 39832 sentences.
And I used this command to test the performance:
opennlp.tools.lang.english.TreebankParser -d -bs 20 -ap 0.95 E:\eclipse\OPENLP\english\parser E:\eclipse\OPENLP\PENN_Test_Raw.txt
where PENN_Test_Raw.txt is the file with test sentences. One sentence per line in the file.
Meanwhile, I guess the low performance I got could be attributed to the low performance of POS. It might have something to do with the tagdict file. I didn't prepare the file myself and I just used the tagdict file downloaded from http://opennlp.sourceforge.net/models/english/parser/.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
Sorry its taken me a few days to respond. I some how missed your previous post too.
I have remembered though that if you don't have the TOP nodes on the evaluation file as well, you'll get poor recall results as EVALB will count the top empty node counts as missing:
With that you get:
-- All --
Number of sentence = 2416
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2415
Bracketing Recall = 82.06
Bracketing Precision = 86.85
Complete match = 0.00
Average crossing = 1.24
No crossing = 61.16
2 or less crossing = 81.95
Tagging accuracy = 96.67
This still doesn't explain your tagging accuracy issue. I doubt it's the tag dictionary as it's actually a little better than what you'll get out of the treebank data by default.
You commands look ok, although since you are using the default values you might try them without all the options. (I see for training there is a bug where you need to have at least three options)
Hi,
I noticed that there is something wrong with the tagger training when done via the parser. For the above post I kicked the job off to make sure I had the command right and saw that the tagger was converging slowly. I've always trained the tagger separately in the past so I've never encountered this.
Hi:
Thanks for your tool. I have used the opennlp tool to train wsj secti on02-21 and test on wsj section23. However, I found the performance I got is lower than that of Adwait Ratnaparkhi's paper. Maybe you have already done some experiment on PENN Treebank, so what your opinion about this? Thanks.
Li Dengdeng
Hi,
I've done these experiements and my recollection is that performance should be slightly higher than that reported in Adwait's thesis. How much lower are we talking about?
There is some pre-processing that is done by all the parsers I know of. This involves creating a TOP node instead of the unlabeled node in the PTB and removing all unary productions of the form X->X.
Hope this helps...Tom
Thanks for your tips. I used to removing all unary productions of the form X->X, but I didn't create a TOP node for each sentence. Maybe I will try it later.
However, after parsing all the sentences, I usd the ./evalb script to evaulate the performance, and I found there are some errors (mismatch lengh) in it because in the POS tagging process, the word ' could be labeled as a punctuation, or POS tag.
Thanks.
Li Junhui
Hi,
This is normal in cases where the tagger mistakes ' as POS vs '' vice versa. The COLLINS.prm, which I think comes with evalb, allows for a certain number of error sentences.
For sections 00 and 23 the parser will produce fewer error sentence than are allowed.
Hope this helps...Tom
The following is the result I got. Thanks!
-- All --
Number of sentence = 2416
Number of Error sentence = 15
Number of Skip sentence = 0
Number of Valid sentence = 2401
Bracketing Recall = 83.27
Bracketing Precision = 84.19
Bracketing FMeasure = 83.73
Complete match = 25.86
Average crossing = 1.38
No crossing = 56.77
2 or less crossing = 79.93
Tagging accuracy = 92.44
-- len<=40 --
Number of sentence = 2245
Number of Error sentence = 15
Number of Skip sentence = 0
Number of Valid sentence = 2230
Bracketing Recall = 84.01
Bracketing Precision = 84.89
Bracketing FMeasure = 84.45
Complete match = 27.76
Average crossing = 1.17
No crossing = 59.82
2 or less crossing = 82.47
Tagging accuracy = 92.31
Hi,
Something is odd. I dug up my models from when I did these experiments (since the stuff I distribute is trained on some brown data as well and a larger set of data for pos data) and v1.3 of the code. With those models and that code on section 23 I get the results at the bottom:
Can you go through your training procedure? I suspect something is wrong there.
I have sections 2-21, 39832 sentences.
Here is my first sentence from section 02 after training pre-processing.
(TOP (S (PP-LOC (IN In) (NP (NP (DT an) (NNP Oct.) (CD 19) (NN review) )(PP (IN of) (NP (`` ``) (NP-TTL (DT The) (NN Misanthrope) )('' '') (PP-LOC (IN at) (NP (NP (NNP Chicago) (POS 's) )(NNP Goodman) (NNP Theatre) ))))(PRN (-LRB- -LRB-) (`` ``) (S-HLN (NP-SBJ (VBN Revitalized) (NNS Classics) )(VP (VBP Take) (NP (DT the) (NN Stage) )(PP-LOC (IN in) (NP (NNP Windy) (NNP City) ))))(, ,) ('' '') (NP-TMP (NN Leisure) (CC &) (NNS Arts) )(-RRB- -RRB-) )))(, ,) (NP-SBJ-2 (NP (NP (DT the) (NN role) )(PP (IN of) (NP (NNP Celimene) )))(, ,) (VP (VBN played) (PP (IN by) (NP-LGS (NNP Kim) (NNP Cattrall) )))(, ,) )(VP (VBD was) (VP (ADVP-MNR (RB mistakenly) )(VBN attributed) (PP-CLR (TO to) (NP (NNP Christina) (NNP Haag) ))))(. .) ))
Empty elements are also removed: (any thing whose words are -NONE- and all the constituents above them).
-- All --
Number of sentence = 2416
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2415
Bracketing Recall = 86.53
Bracketing Precision = 86.85
Complete match = 32.88
Average crossing = 1.24
No crossing = 61.16
2 or less crossing = 81.95
Tagging accuracy = 96.67
-- len<=40 --
Number of sentence = 2245
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2244
Bracketing Recall = 87.11
Bracketing Precision = 87.53
Complete match = 35.16
Average crossing = 1.06
No crossing = 63.95
2 or less crossing = 84.27
Tagging accuracy = 96.63
Thanks...Tom
Hi, Thanks for your reply and your results.
There are also 39832 training sentences in my training file, and the first sentence is
(TOP (S (PP (IN In)(NP (NP (DT an)(NNP Oct.)(CD 19)(NN review))(PP (IN of)(NP (`` ``)(NP (DT The)(NN Misanthrope))('' '')(PP (IN at)(NP (NP (NNP Chicago)(POS 's))(NNP Goodman)(NNP Theatre)))))(PRN (-LRB- -LRB-)(`` ``)(S (NP (VBN Revitalized)(NNS Classics))(VP (VBP Take)(NP (DT the)(NN Stage))(PP (IN in)(NP (NNP Windy)(NNP City)))))(, ,)('' '')(NP (NN Leisure)(CC &)(NNS Arts))(-RRB- -RRB-))))(, ,)(NP (NP (NP (DT the)(NN role))(PP (IN of)(NP (NNP Celimene))))(, ,)(VP (VBN played)(PP (IN by)(NP (NNP Kim)(NNP Cattrall))))(, ,))(VP (VBD was)(VP (ADVP (RB mistakenly))(VBN attributed)(PP (TO to)(NP (NNP Christina)(NNP Haag)))))(. .)) )
While preparing the training sentences, I got rid of function tags, -none- nodes, etc.
I used the following command to train the models:
opennlp.tools.parser.ParserME -dict -tag -chunk -build -check E:\eclipse\OPENLP\PENN_Train.MRG E:\eclipse\OPENLP\English\parser 100 5
where PENN_Train.MRG is the file with 39832 sentences.
And I used this command to test the performance:
opennlp.tools.lang.english.TreebankParser -d -bs 20 -ap 0.95 E:\eclipse\OPENLP\english\parser E:\eclipse\OPENLP\PENN_Test_Raw.txt
where PENN_Test_Raw.txt is the file with test sentences. One sentence per line in the file.
Meanwhile, I guess the low performance I got could be attributed to the low performance of POS. It might have something to do with the tagdict file. I didn't prepare the file myself and I just used the tagdict file downloaded from http://opennlp.sourceforge.net/models/english/parser/.
Thanks!
or could you tell me the commands you used to get the 86.53 recall and 86.85 precision?
Thanks.
Hi,
Sorry its taken me a few days to respond. I some how missed your previous post too.
I have remembered though that if you don't have the TOP nodes on the evaluation file as well, you'll get poor recall results as EVALB will count the top empty node counts as missing:
EVALB/evalb -p EVALB/COLLINS.prm 23.gold 23.opennlp-1.3
With that you get:
-- All --
Number of sentence = 2416
Number of Error sentence = 1
Number of Skip sentence = 0
Number of Valid sentence = 2415
Bracketing Recall = 82.06
Bracketing Precision = 86.85
Complete match = 0.00
Average crossing = 1.24
No crossing = 61.16
2 or less crossing = 81.95
Tagging accuracy = 96.67
This still doesn't explain your tagging accuracy issue. I doubt it's the tag dictionary as it's actually a little better than what you'll get out of the treebank data by default.
You commands look ok, although since you are using the default values you might try them without all the options. (I see for training there is a bug where you need to have at least three options)
training: java -mx1800M opennlp.tools.parser.ParserME wsj.train.ustrip models 100 5
testing: java -mx500M opennlp.tools.lang.english.TreebankParser -d models < 23.tok > 23.opennlp-1.3
I'll re-train this on that data and report back the training out in the next day or so (as it takes a while so I'll run it tonight).
Hope this helps...Tom
Hi,
I noticed that there is something wrong with the tagger training when done via the parser. For the above post I kicked the job off to make sure I had the command right and saw that the tagger was converging slowly. I've always trained the tagger separately in the past so I've never encountered this.
Here's what I did to train the tagger:
perl -ne 'while(/\(([^()]+)\)/g) { @parts=split(/ /,$1); print "$parts[1]_$parts[0] ";} print "\n";' wsj.train.ustrip | perl -ne 's/\s+$//; print "$_\n";' > wsj.train.tag
java -mx500M opennlp.tools.postag.POSTaggerME wsj.train.tag tag.bin.gz
I'll look for what the issue is with training the tagger via the parser.
Thanks...Tom
Hi,
You can fix the bug in tag training with the following change to opennlp.tool.parser.ParserEventStream.java
68 else if (etype == EventTypeEnum.TAG) {
69 this.tcg = new DefaultPOSContextGenerator(null);
70 }
Pass a null to DefaultPOSContextGenerator rather than the dictionary.
Hope this helps...Tom