Could coreference package find definite noun phrase anaphora?
I am using the OpenNLP coref package. I had tried some examples from this package. And I found that it can find the relations in pronominal anaphora. For example:
<strong>Researchers</strong> from many different places attended the conference. <strong>They</strong> discuss experiment results with each other after the meeting.
The relation between "Researchers" and "They" was found by the package successfully.
But... It can't find the relations in definite noun phrase anaphora. For example:
<strong>Bioengineering researchers</strong> from many different places attended the conference. <strong>The participants</strong> discuss experiment results with each other.
It can't find the relation between "Bioengineering researchers" and "The participants" in this example.
Any help appreciated.
Jianlee
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
The module does try and resolve these but misses this case. Specifically, it's only 23% sure these are coreferent.
The participants -> [ Bioengineering researchers from many different places ] (male,male) 0.2338269273799131 [default, sim.compatible, gen.compatible, num.compatible, all.compatible, pt=BOS, pw=BOS, nt=VBP, nw=discuss, bnt=VBP, bnw=discuss, hd=0, de=3, ds=1]
Definite NPs are much harder than pronouns and often don't have explicit references if the entity is inferable. "We walked up to the house. The door was open". This is even more the case with plural definite NPs (like the one from your example) since the antecedent may be split: "Tom went to the store and met Bill. The boys then had lunch together." and the annotated data on which this model was trained doesn't account for such phenomena.
The model is essentially playing it safe and not positing a relationship here because in many other cases this is correct.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Aha! Tom, your answer is enlightening -- can you tell me from what class/data structure I can retrieve the bracketed information:
The participants -> [ Bioengineering researchers from many different places ] (male,male) 0.2338269273799131 [default, sim.compatible, gen.compatible, num.compatible, all.compatible, pt=BOS, pw=BOS, nt=VBP, nw=discuss, bnt=VBP, bnw=discuss, hd=0, de=3, ds=1]
I assume there's some confidence setting somewhere that drops this anaphor due to the low confidence; so where is the setting. How can I change the setting?
Extended commentary: suppose I'm comparing the outputs of N anaphor and/or coref resolution codes; and suppose I also have additional data (e.g, a semantic tagger that I think is the cat's pajamas). Well, in that case I would want to know everything that ONLP found -- not only what was left after discarding -- so that I can compare everything that OpenNLP has deduced with everything that all other codes have deduced, and then employ my own reconciliation/dropping strategy. At the moment my goal is very high precision, and my heuristic is: if everyone agrees that 'x' is true, then it is most likely true that 'x' is indeed true.
thanks,
David H.
Center for Applied Scientific Computing (CASC)
Lawrence Livermore National Lab. (LLNL)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Could coreference package find definite noun phrase anaphora?
I am using the OpenNLP coref package. I had tried some examples from this package. And I found that it can find the relations in pronominal anaphora. For example:
<strong>Researchers</strong> from many different places attended the conference. <strong>They</strong> discuss experiment results with each other after the meeting.
The relation between "Researchers" and "They" was found by the package successfully.
But... It can't find the relations in definite noun phrase anaphora. For example:
<strong>Bioengineering researchers</strong> from many different places attended the conference. <strong>The participants</strong> discuss experiment results with each other.
It can't find the relation between "Bioengineering researchers" and "The participants" in this example.
Any help appreciated.
Jianlee
Hi,
The module does try and resolve these but misses this case. Specifically, it's only 23% sure these are coreferent.
The participants -> [ Bioengineering researchers from many different places ] (male,male) 0.2338269273799131 [default, sim.compatible, gen.compatible, num.compatible, all.compatible, pt=BOS, pw=BOS, nt=VBP, nw=discuss, bnt=VBP, bnw=discuss, hd=0, de=3, ds=1]
Definite NPs are much harder than pronouns and often don't have explicit references if the entity is inferable. "We walked up to the house. The door was open". This is even more the case with plural definite NPs (like the one from your example) since the antecedent may be split: "Tom went to the store and met Bill. The boys then had lunch together." and the annotated data on which this model was trained doesn't account for such phenomena.
The model is essentially playing it safe and not positing a relationship here because in many other cases this is correct.
Hope this helps...Tom
Anaphora resolution systems rely on syntactic, semantic or statistical clues to identify the antecedent of an anaphor.
I am wondering which strategy are used in your anaphora resolution system. Can you give me some references about your system?
Thank you...
Jianlee
Hi,
You'll probably find this thread and the referenced information helpful.
https://sourceforge.net/forum/forum.php?thread_id=1456314&forum_id=9943
Hope this helps, post back if it doesn't. Thanks..Tom
Aha! Tom, your answer is enlightening -- can you tell me from what class/data structure I can retrieve the bracketed information:
The participants -> [ Bioengineering researchers from many different places ] (male,male) 0.2338269273799131 [default, sim.compatible, gen.compatible, num.compatible, all.compatible, pt=BOS, pw=BOS, nt=VBP, nw=discuss, bnt=VBP, bnw=discuss, hd=0, de=3, ds=1]
I assume there's some confidence setting somewhere that drops this anaphor due to the low confidence; so where is the setting. How can I change the setting?
Extended commentary: suppose I'm comparing the outputs of N anaphor and/or coref resolution codes; and suppose I also have additional data (e.g, a semantic tagger that I think is the cat's pajamas). Well, in that case I would want to know everything that ONLP found -- not only what was left after discarding -- so that I can compare everything that OpenNLP has deduced with everything that all other codes have deduced, and then employ my own reconciliation/dropping strategy. At the moment my goal is very high precision, and my heuristic is: if everyone agrees that 'x' is true, then it is most likely true that 'x' is indeed true.
thanks,
David H.
Center for Applied Scientific Computing (CASC)
Lawrence Livermore National Lab. (LLNL)