OpenNLP / Feature Requests / #10 CoNLL 03 English/German Support

#10 CoNLL 03 English/German Support

Status: closed

Owner: nobody

Labels: None

Priority: 5

Updated: 2010-12-14

Created: 2010-10-06

Creator: James Kosin

Private: No

Adding support to convert CoNLL 03 Reurters Support to NameFinder. And maybe more; since, it does have POS tags as well.

Discussion

James Kosin - 2010-10-06

Jorn,

I got the CoNLL 03 data converters in place and working. Wow, so much easier to expand this... anyway, I hope you can get the data as well. I'd like some verification on the output of the data and that it is correct (fully).

The CoNLL 03 data also has POS tags for the sentences. Would it be useful to also create a parser for the POS engine?

James

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Kosin - 2010-10-06

Well, after training, I attempted evaluation and got these numbers. Are they any good?
[code]
Loading Token Name Finder model ... done (2.106s)
current: 176.1 sent/s avg: 176.1 sent/s total: 185 sent
current: 616.2 sent/s avg: 384.7 sent/s total: 774 sent
current: 439.5 sent/s avg: 401.9 sent/s total: 1210 sent
current: 604.2 sent/s avg: 452.2 sent/s total: 1813 sent
current: 760.5 sent/s avg: 513.9 sent/s total: 2573 sent
current: 505.5 sent/s avg: 512.5 sent/s total: 3078 sent

Average: 510.8 sent/s
Total: 3251 sent
Runtime: 6.365s

Precision: 0.9373834886817577
Recall: 0.6596091205211726
F-Measure: 0.7743388353801384
[/code]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Kosin - 2010-10-08

Okay, verified that the model is working. Just still having large problems with the detectors with the sample I sent Jorn.

Anyway, I seem to have good data; and will assume so until I get some outside verification.

I'll also look into the POS parser and see if maybe I can just use the ConllxPOS... parsers if they are the same. I almost felt bad just using Conll03... for the current since it doesn't differ by much from the older Conll02... set.

James

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Kosin - 2010-10-10

I found a bug with the code I submitted... I did a little reading and found out the 'B-' prefix is being used. I also found an instance in the training set.

I've fixed the bug and marked a todo item for the Conll 03 parser. If we want to train for multiple types in a single model, then there is a problem with multiple types comming next to each other; since the 'B-' prefix is only used for the same type.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joern Kottmann - 2010-10-13

I reviewed your wiki page, would it be possible to add your evaluation results to it ? In my opinion this would be really helpful for others because they can then compare their results (maybe after modifying the code) to your results.

In the results you reported below the recall seems very low. Maybe we can compare the results against the other results reported for Conll03 to see where we stand.

Jörn

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Kosin - 2010-10-20

summary: CoNLL 03 English Support --> CoNLL 03 English/German Support
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Kosin - 2010-10-20

I'm adding the logic for the format for the German data. I'll leave the testing of this for someone who has the corpus for this to validate the model.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joern Kottmann - 2010-12-14

Issue is now migrated to ASF:
https://issues.apache.org/jira/browse/OPENNLP-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joern Kottmann - 2010-12-14

status: open --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.