I am a student from Australia and have recently taken a short course in NLP. As a project I am looking at different methods for disambiguation in text. It would be nice if I could find a parser that detects ambiguities for me. I read on a page (the OpenNLP Tools page I believe) that Maxent could be used to detect such ambiguities? Is this the case and if so, any clues as to how I might go about this?
Sorry if this is all a bit vague as I am very much a newbie in this whole area.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
You could do this with the existing parser. Are you looking for sentence which we're hard for the parser to parse or specific attachment decisions that the parser makes and found hard?
If the former then I would look at the top n parses and at how much spread there is in the probability assigned to each of them. When the top two or three parses are pretty close to one another that suggests that the parser isn't very confident about which parse is the best.
If you want to look at specific attachment decisions you would need to look at the distributions output from the build and check models for a specific parse.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes I am just looking at finding sentences which may have multiple meanings (such as "She saw the boy with the telescope" - either she was using a telescope to see the boy, or she saw a boy and the boy had a telescope).
So the parser achieves this? Again, please excuse my total inexperience with OpenNLP or Maxent, looking at the source code I would use the ParserME class as my best option? In the Usage it suggests that I need to train the parser first? Or can I use some existing grammar?
If the parser found these sentences in a body of text with some number of top parses, then I could analyse these for my project.
Thanks again for your help and patience.
Jared.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please disregard my last message Tom. I believe I have figured out what I need to do.
I am having problems with the English/Parser model however as the Build.Bin.gz file (after extraction) is suggesting that it is corrupt or something like that (this results in an IOException: Not in GZIP format)?
Any suggestions?
I'm on a Windows machine if that helps.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi.
The file need to be gzipped. Some application associated with the gz extension is being launched by your browser and is uncompressing the file. Sometime downloading it by right clicking can avoid this in windows. Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
I am a student from Australia and have recently taken a short course in NLP. As a project I am looking at different methods for disambiguation in text. It would be nice if I could find a parser that detects ambiguities for me. I read on a page (the OpenNLP Tools page I believe) that Maxent could be used to detect such ambiguities? Is this the case and if so, any clues as to how I might go about this?
Sorry if this is all a bit vague as I am very much a newbie in this whole area.
Thanks!
Hi,
You could do this with the existing parser. Are you looking for sentence which we're hard for the parser to parse or specific attachment decisions that the parser makes and found hard?
If the former then I would look at the top n parses and at how much spread there is in the probability assigned to each of them. When the top two or three parses are pretty close to one another that suggests that the parser isn't very confident about which parse is the best.
If you want to look at specific attachment decisions you would need to look at the distributions output from the build and check models for a specific parse.
Hope this helps...Tom
Hi Tom
Thanks for your quick reply.
Yes I am just looking at finding sentences which may have multiple meanings (such as "She saw the boy with the telescope" - either she was using a telescope to see the boy, or she saw a boy and the boy had a telescope).
So the parser achieves this? Again, please excuse my total inexperience with OpenNLP or Maxent, looking at the source code I would use the ParserME class as my best option? In the Usage it suggests that I need to train the parser first? Or can I use some existing grammar?
If the parser found these sentences in a body of text with some number of top parses, then I could analyse these for my project.
Thanks again for your help and patience.
Jared.
Please disregard my last message Tom. I believe I have figured out what I need to do.
I am having problems with the English/Parser model however as the Build.Bin.gz file (after extraction) is suggesting that it is corrupt or something like that (this results in an IOException: Not in GZIP format)?
Any suggestions?
I'm on a Windows machine if that helps.
Thanks.
Hi.
The file need to be gzipped. Some application associated with the gz extension is being launched by your browser and is uncompressing the file. Sometime downloading it by right clicking can avoid this in windows. Hope this helps...Tom