Re: [Senserelate-users] Identifying Synset from SenseRelate::AllWords Output

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Marc,

Yes,  you should be able to do this. I don't really use NLTK for
similarity measurements, so I'm not sure what issues you might have
encountered, but he form of word senses output by SenseRelate can be
used directly as input to WordNet::Similarity. Both use the wps form,
which looks like this : word#pos#sense. But, you'd need to have a
program that would take that SenseRelate output (along with your
target words) and create those pairs of word senses that get intput to
WordNet::Similarity (via the --file option, like in the example
above).

So, there should be no problem directly inputing SenseRelate output
into WordNet::Similarity, you'd just need a small program to put
things into the proper format (which is shown earlier in this thread).

I hope that makes some sense. Please feel free to follow up if it doesn't!

Good luck,
Ted

On Sun, Mar 22, 2015 at 2:18 PM, Marc Halusic <mha...@gm...> wrote:
> Hi Ted,
> Now that I've had more time to look through your advice, I realized that we
> had moved away from my original question, which is whether it is possible to
> use output from WordNet::SenseRelate::AllWords as the intermediate step in
> the process we were discussing.  That is, if I were to use
> WordNet::Similarity in the way that you recommended, would I be able to use
> Wordnet::SenseRelate::AllWords to disambiguate the sentence "he was the
> greatest pilot in the airline" into something like "he be#v#1 the great#a#1
> pilot#n#1 in the airline#n#2" that I could then use in WordNet::Similarity?
>
> The problem I had run into earlier when I tried a test run with a short
> sentence using similarity metrics in NLTK was that one word was
> disambiguated by SenseRelate as "working#v#20", which wasn't recognized by
> wordnet when input in that form.  I assume this is related to the feature of
> SenseRelate that words are kept in their original form whenever possible.
> Is there any way to use SenseRelate to automatically identify the synset
> sense of each word such that I could then input those synsets into
> WordNet::Similarity in the way that you suggested?
>
> Best,
>
> Marc
>
> Marc Halusic
> Graduate Student
> University of Missouri-Columbia
> Social and Personality Psychology
>
>
>
>
>
>
> On Mon, Mar 16, 2015 at 2:19 PM, Marc Halusic <mha...@gm...> wrote:
>>
>> Wow, thanks Ted!  I think that should work, though I'll need some time to
>> pour over your response to make sure I completely understand it.  I'll let
>> you know if I run into any roadblocks.
>>
>>
>> Thanks Again!
>>
>> Marc
>>
>> Marc Halusic
>> Graduate Student
>> University of Missouri-Columbia
>> Social and Personality Psychology
>>
>>
>>
>>
>> On Mon, Mar 16, 2015 at 1:54 PM, Ted Pedersen <dul...@gm...> wrote:
>>>
>>> Hi Marc,
>>>
>>> I think what you are describing can nearly be done with
>>> WordNet::Similarity, although it will not produce output in precisely the
>>> format you described.
>>>
>>> The first thing you'd need to provide would be an input file which
>>> contains your target words (in the second column) and the words from your
>>> disambiguated sentence (in the first column).
>>>
>>> ukko(19): cat simtest.txt
>>> be#v#1 achiever#n#1
>>> great#a#1 achiever#n#1
>>> pilot#n#1 achiever#n#1
>>> airline#n#2 achiever#n#1
>>> be#v#1 best#a#1
>>> great#a#1 best#a#1
>>> pilot#n#1 best#a#1
>>> airline#n#2 best#a#1
>>> be#v#1 greatness#n#1
>>> great#a#1 greatness#n#1
>>> pilot#n#1 greatness#n#1
>>> airline#n#2 greatness#n#1
>>>
>>> When you input this file to WordNet::Similarity, it will find the
>>> similarity of all these pairs. So I think my first 4 lines of file input
>>> would produce the row you'd like for achiever#n#1, and so on.
>>>
>>> ukko(18): similarity.pl --type WordNet::Similarity::lch --file
>>> simtest.txt
>>> Loading WordNet... done.
>>> Loading Module... done.
>>> be#v#1  achiever#n#1
>>> Warning (WordNet::Similarity::lch::parseWps()) - be#v and achiever#n
>>> belong to different parts of speech.
>>> be#v#1  achiever#n#1  -1000000
>>>
>>> great#a#1  achiever#n#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> pilot#n#1  achiever#n#1
>>> pilot#n#1  achiever#n#1  1.89711998488588
>>>
>>> airline#n#2  achiever#n#1
>>> airline#n#2  achiever#n#1  1.04982212449868
>>>
>>> be#v#1  best#a#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> great#a#1  best#a#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> pilot#n#1  best#a#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> airline#n#2  best#a#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> be#v#1  greatness#n#1
>>> Warning (WordNet::Similarity::lch::parseWps()) - be#v and greatness#n
>>> belong to different parts of speech.
>>> be#v#1  greatness#n#1  -1000000
>>>
>>> great#a#1  greatness#n#1
>>> Possible part(s) of speech of word(s) cannot be handled by module.
>>>
>>> pilot#n#1  greatness#n#1
>>> pilot#n#1  greatness#n#1  0.980829253011726
>>>
>>> airline#n#2  greatness#n#1
>>> airline#n#2  greatness#n#1  0.980829253011726
>>>
>>> So, I realize this isn't exactly what you are describing, but I *think*
>>> it is as least close?
>>>
>>> I hope this is helpful, but certainly feel free to follow up with
>>> questions as needed. It sounds like an interesting application, and
>>> certainly we'd like to see it work out for you!
>>>
>>> Cordially,
>>> Ted
>>>
>>>
>>>
>>> On Mon, Mar 16, 2015 at 12:09 PM, Marc Halusic <mha...@gm...>
>>> wrote:
>>>>
>>>> Hi Ted,
>>>> That's close, but just to make sure we're on the same page, here is what
>>>> I mean:
>>>>
>>>> Once the sentence is disambiguated into "he be#v#1 the great#a#1
>>>> pilot#n#1 in the airline#n#2", use a particular similarity measure (let's
>>>> use Leacock-Chodorow in this example, which I will abbreviate as LCH) to
>>>> find the most similar synset to each of my target synsets, and print the
>>>> similarity metric.  So in this example, I would be interested in converting
>>>> the sentence "He was the greatest pilot in the airline" and my target
>>>> synsets of "achiever#n#1", "best#a#1", and "greatness#n#1" into a row in a
>>>> csv file that would look something like:
>>>>
>>>> "he be#v#1 the great#a#1 pilot#n#1 in the airline#n#2",
>>>> "LCH_achiever#n#1=1.85", "LCH_best#a#1=1.0", LCH_greatness#n#1=0.93"
>>>>
>>>> where 1.85 is the LCH similarity between achiever#n#1 and pilot#n#1 (I'm
>>>> assuming those are the closest, because of part of speech constraints), 1.0
>>>> is the LCH similarity between best#a#1 and great#a#1, and 0.93 is the LCH
>>>> similarity between greatness#n#1 and pilot#n#1.
>>>>
>>>> So identifying the most similar synset in a sentence to a given target
>>>> synset is a step on the way to my main aim, which is to answer the questions
>>>> "for the target synset 'achiever#n#1', how similar is the most similar word
>>>> in this sentence", and "for the target synset 'greatness#n#1', how similar
>>>> is the most similar word in this sentence".
>>>>
>>>> I hope that is clear, but please let me know if there is anything that I
>>>> have left ambiguous.
>>>>
>>>> Best,
>>>>
>>>> Marc
>>>>
>>>> Marc Halusic
>>>> Graduate Student
>>>> University of Missouri-Columbia
>>>> Social and Personality Psychology
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Mar 15, 2015 at 5:35 PM, Ted Pedersen <dul...@gm...>
>>>> wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for the details. Let me just see if I can construct an
>>>>> example...
>>>>>
>>>>> suppose your sentence is
>>>>>
>>>>> He was the greatest pilot in the airline.
>>>>>
>>>>> This is disambiguated as follows...
>>>>>
>>>>> he be#v#1 the great#a#1 pilot#n#1 in the airline#n#2
>>>>>
>>>>> Now, you want to compare each of the above synsets with your target
>>>>> synsets
>>>>>
>>>>> achiever#n#1
>>>>> best#a#1
>>>>> greatness#n#1
>>>>>
>>>>> and determine which synset in the sentence (be#v#1, great#a#1,
>>>>> pilot#n#1, airline#n#2) is closest to one of your target synsets...?
>>>>>
>>>>> Is that a reasonable example, or am I missing something here?
>>>>>
>>>>> Thanks!
>>>>> Ted
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 13, 2015 at 11:39 AM, Marc Halusic <mha...@gm...>
>>>>> wrote:
>>>>>>
>>>>>> Hi All,
>>>>>> I replied to Ted earlier today, when I should have replied to the
>>>>>> listserve.  So here are my responses to Ted's questions:
>>>>>>
>>>>>> Hi Ted,
>>>>>> Thank you for getting back to me so quickly!  In my message, I was
>>>>>> striving for brevity, but it looks like as a result I was unclear in what I
>>>>>> am trying to accomplish.  I am a psychologist working on automating a coding
>>>>>> procedure for written content.  Specifically, I am trying to categorize
>>>>>> sentences in terms of whether they do or do not contain achievement motive
>>>>>> imagery.  In the past, people have tried to accomplish this using a simple
>>>>>> word-search function, setting up rules such as "if the sentence contains the
>>>>>> word 'best', mark the sentence has containing achievement imagery."
>>>>>> Although such a procedure correctly identifies "he is the world's best
>>>>>> scientist" as having achievement imagery, we get false positives when "best"
>>>>>> is used in different senses, such as when it means most likely ("it was his
>>>>>> best chance") or when it denotes emotional closeness ("they are best
>>>>>> friends").
>>>>>>
>>>>>> So I am attempting a 2-step procedure: first, disambiguate the
>>>>>> sentences into their synsets so I can know the most likely sense in which
>>>>>> each word is meant.  Next, take each synset that I have determined would
>>>>>> denote achievement motive imagery, such as achiever#n#1, best#a#1, and
>>>>>> greatness#n#1 (let's call them target synsets), and compute the similarity
>>>>>> between each target synset and its closest synset in a given sentence.  That
>>>>>> way, I can categorize a sentence as having achievement imagery if it either
>>>>>> contains any of my target synsets, or contains any synsets that are very
>>>>>> similar to any of my target synsets.  I will determine what counts as very
>>>>>> similar through testing with already coded materials.
>>>>>>
>>>>>> I would of course be happy to use WordNet::Similarity to compute the
>>>>>> similarity function, particularly because I like that it contains the option
>>>>>> to perform adapted Lesk relatedness, whereas python's nltk does not, but I
>>>>>> don't know that this helps me with the larger problem of disambiguating
>>>>>> sentences, and then calculating their similarities to words not present in
>>>>>> those sentences.
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Marc
>>>>>>
>>>>>> Marc Halusic
>>>>>> Graduate Student
>>>>>> University of Missouri-Columbia
>>>>>> Social and Personality Psychology
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 13, 2015 at 7:12 AM, Ted Pedersen <dul...@gm...>
>>>>>> wrote:
>>>>>>>
>>>>>>> If you want to find the similarity between two synsets like cat#n#1
>>>>>>> and dog#n#1, have you consider the use of WordNet::Similarity? This is what
>>>>>>> SenseRelate uses under the hood, and is really set up to do these kinds of
>>>>>>> similarity measurements.
>>>>>>>
>>>>>>> http://wn-similarity.sourceforge.net for details...below are some
>>>>>>> examples of how it can be used.
>>>>>>>
>>>>>>> ukko(32): similarity.pl -type WordNet::Similarity::path cat#n#1
>>>>>>> dog#n#1
>>>>>>> Loading WordNet... done.
>>>>>>> Loading Module... done.
>>>>>>> cat#n#1  dog#n#1  0.2
>>>>>>> ukko(33): similarity.pl -type WordNet::Similarity::path --file test
>>>>>>> Loading WordNet... done.
>>>>>>> Loading Module... done.
>>>>>>> cat#n#1  dog#n#1
>>>>>>> cat#n#1  dog#n#1  0.2
>>>>>>>
>>>>>>> mouse#n#1  hat#n#2
>>>>>>> mouse#n#1  hat#n#2  0.0454545454545455
>>>>>>>
>>>>>>> ukko(34): cat test
>>>>>>> cat#n#1 dog#n#1
>>>>>>> mouse#n#1 hat#n#2
>>>>>>>
>>>>>>> That said, if I've misunderstood what you'd like to do, please let me
>>>>>>> know and I'll try again!
>>>>>>>
>>>>>>> Good luck,
>>>>>>> Ted
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 12, 2015 at 3:45 PM, Marc Halusic <mha...@gm...>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>> I am working on a project that requires that I disambiguate a large
>>>>>>>> number of sentences into collections of synsets that could be recognized by
>>>>>>>> wordnet.  The reason that I need to do this is that, for each sentence, I
>>>>>>>> need to compute similarity scores of a variety of target words against their
>>>>>>>> most similar equivalents in a sentence.  For example, I might want to
>>>>>>>> compare the target word "dog#n#1' against the sentence "the cats ate the
>>>>>>>> fish" and convert the sentence so I can find that "dog#n#1" is most similar
>>>>>>>> to "cat#n#1", and compute how similar those two synsets are (I have a python
>>>>>>>> script that can do this as long as the sentences have been disambiguated
>>>>>>>> into wordnet synsets).  Because the target words are not in the sentences,
>>>>>>>> and are very numerous (around 300), I don't think that using a trace option
>>>>>>>> is quite right for what I am trying to do.  Looking at previous posts, I
>>>>>>>> understand that it is either not easy or not possible to convert SenseRelate
>>>>>>>> output to synsets that could be used in such calculations.  I am therefore
>>>>>>>> curious whether it is impossible, or just difficult, and how difficult it
>>>>>>>> would be.  I am also curious to know if there are better ways that I could
>>>>>>>> perform these calculations with SenseRelate that perhaps I have not thought
>>>>>>>> of yet.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Marc
>>>>>>>>
>>>>>>>> Marc Halusic
>>>>>>>> Graduate Student
>>>>>>>> University of Missouri-Columbia
>>>>>>>> Social and Personality Psychology
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>>>>> sponsored
>>>>>>>> by Intel and developed in partnership with Slashdot Media, is your
>>>>>>>> hub for all
>>>>>>>> things parallel software development, from weekly thought leadership
>>>>>>>> blogs to
>>>>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>>>>> the
>>>>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>>>>> _______________________________________________
>>>>>>>> senserelate-users mailing list
>>>>>>>> sen...@li...
>>>>>>>> https://lists.sourceforge.net/lists/listinfo/senserelate-users
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>>>>> sponsored
>>>>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>>>>> for all
>>>>>> things parallel software development, from weekly thought leadership
>>>>>> blogs to
>>>>>> news, videos, case studies, tutorials and more. Take a look and join
>>>>>> the
>>>>>> conversation now. http://goparallel.sourceforge.net/
>>>>>> _______________________________________________
>>>>>> senserelate-users mailing list
>>>>>> sen...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/senserelate-users
>>>>>>
>>>>>
>>>>
>>>
>>
>