Re: [Senserelate-users] Identifying Synset from SenseRelate::AllWords Output

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi All,
I replied to Ted earlier today, when I should have replied to the
listserve.  So here are my responses to Ted's questions:

Hi Ted,
Thank you for getting back to me so quickly!  In my message, I was striving
for brevity, but it looks like as a result I was unclear in what I am
trying to accomplish.  I am a psychologist working on automating a coding
procedure for written content.  Specifically, I am trying to categorize
sentences in terms of whether they do or do not contain achievement motive
imagery.  In the past, people have tried to accomplish this using a simple
word-search function, setting up rules such as "if the sentence contains
the word 'best', mark the sentence has containing achievement imagery."
Although such a procedure correctly identifies "he is the world's best
scientist" as having achievement imagery, we get false positives when
"best" is used in different senses, such as when it means most likely ("it
was his best chance") or when it denotes emotional closeness ("they are
best friends").

So I am attempting a 2-step procedure: first, disambiguate the sentences
into their synsets so I can know the most likely sense in which each word
is meant.  Next, take each synset that I have determined would denote
achievement motive imagery, such as achiever#n#1, best#a#1, and
greatness#n#1 (let's call them target synsets), and compute the similarity
between each target synset and its closest synset in a given sentence.
That way, I can categorize a sentence as having achievement imagery if it
either contains any of my target synsets, or contains any synsets that are
very similar to any of my target synsets.  I will determine what counts as
very similar through testing with already coded materials.

I would of course be happy to use WordNet::Similarity to compute the
similarity function, particularly because I like that it contains the
option to perform adapted Lesk relatedness, whereas python's nltk does not,
but I don't know that this helps me with the larger problem of
disambiguating sentences, and then calculating their similarities to words
not present in those sentences.

Best,

Marc

Marc Halusic
Graduate Student
University of Missouri-Columbia
Social and Personality Psychology

On Fri, Mar 13, 2015 at 7:12 AM, Ted Pedersen <dul...@gm...> wrote:

> If you want to find the similarity between two synsets like cat#n#1 and
> dog#n#1, have you consider the use of WordNet::Similarity? This is what
> SenseRelate uses under the hood, and is really set up to do these kinds of
> similarity measurements.
>
> http://wn-similarity.sourceforge.net for details...below are some
> examples of how it can be used.
>
> ukko(32): similarity.pl -type WordNet::Similarity::path cat#n#1 dog#n#1
> Loading WordNet... done.
> Loading Module... done.
> cat#n#1  dog#n#1  0.2
> ukko(33): similarity.pl -type WordNet::Similarity::path --file test
> Loading WordNet... done.
> Loading Module... done.
> cat#n#1  dog#n#1
> cat#n#1  dog#n#1  0.2
>
> mouse#n#1  hat#n#2
> mouse#n#1  hat#n#2  0.0454545454545455
>
> ukko(34): cat test
> cat#n#1 dog#n#1
> mouse#n#1 hat#n#2
>
> That said, if I've misunderstood what you'd like to do, please let me know
> and I'll try again!
>
> Good luck,
> Ted
>
>
> On Thu, Mar 12, 2015 at 3:45 PM, Marc Halusic <mha...@gm...> wrote:
>
>> Hi All,
>> I am working on a project that requires that I disambiguate a large
>> number of sentences into collections of synsets that could be recognized by
>> wordnet.  The reason that I need to do this is that, for each sentence, I
>> need to compute similarity scores of a variety of target words against
>> their most similar equivalents in a sentence.  For example, I might want to
>> compare the target word "dog#n#1' against the sentence "the cats ate the
>> fish" and convert the sentence so I can find that "dog#n#1" is most similar
>> to "cat#n#1", and compute how similar those two synsets are (I have a
>> python script that can do this as long as the sentences have been
>> disambiguated into wordnet synsets).  Because the target words are not in
>> the sentences, and are very numerous (around 300), I don't think that using
>> a trace option is quite right for what I am trying to do.  Looking at
>> previous posts, I understand that it is either not easy or not possible to
>> convert SenseRelate output to synsets that could be used in such
>> calculations.  I am therefore curious whether it is impossible, or just
>> difficult, and how difficult it would be.  I am also curious to know if
>> there are better ways that I could perform these calculations with
>> SenseRelate that perhaps I have not thought of yet.
>>
>> Best,
>>
>> Marc
>>
>> Marc Halusic
>> Graduate Student
>> University of Missouri-Columbia
>> Social and Personality Psychology
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> senserelate-users mailing list
>> sen...@li...
>> https://lists.sourceforge.net/lists/listinfo/senserelate-users
>>
>>
>