[Senserelate-developers] followup from nov 18 meeting

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Jason,

There were a few issues that came up in our Nov 18 meeting that I wanted
to make sure we remembered as they seem to directly impact how we'll
proceed with WordNet-SenseRelate.

First, we discussed the thresholding that we are performing for the word
sense disambiguation measure, and agreed that we really need it at two
levels. First, there is the level of "pairwise" comparisons, which seems
to be what was in the pseudo-code in the presentation. This threshold
avoids adding very small pairwise relatedness values into the overall
relatedness score. We also need a threshold that checks the overall
relatedness value, and makes sure it is large enough to merit making a
decision in favor of a particular sense. So, these are two distinct
thresholds, and we probably need different options for them. In fact,
these two options both appear to be the in the current version of
SenseRelate/disamb.pl. Here are their help entries:

 --minScore SCORE   The score of relatedness with the context, of any sense
                     of the target word, must be at least above SCORE for that
                     sense to be even considered a candidate for the answer.

 --scoreThreshold SCORE
                     For given sense of target word, ignore the score from a
                     particular non-target word if it is <= SCORE. Default = 0.

I would actually propose that we rename these as

 --contextScore
 --pairScore

hopefully those name are descriptive enough to convey which is which. :)

Second, we agreed that it would be very interesting to conduct an
experiment which compares lesk with jcn using pos/coercion. The issue or
problem for lesk is how long it takes, while jcn is limited to only
dealing with nouns (or verbs) exclusively. However, if we are able to
"coerce" words into the part of speech of the target word, we may be able
to use jcn and get just as good results as lesk in less time.

Third, the new derivational relations in Wordnet 2.0 may be helpful in
coercing to the desired part of speech.

Fourth, Bano pointed out that in his MS thesis there is a table (based on
WordNet 1.7) that shows that of the 56,000 compounds known to WordNet,
56,000 of them only have one sense. Given this, it seems that pos tagging
of compounds is not too significant an issue, since the sense
disambiguation problem itself isn't too significant (there is only one to
choose from). Therefore a simple hueristic for tagging compounds as nouns
may be reasonable. We want to verify that the situation remains like this
in 2.0 however.

Fifth, and finally, Bano suggested that we may want to consider using jcn
where we can, and then only using lesk when we need the added ability to
look at different parts of speech. This seems like an excellent
suggestion, and might represent a way to improve performance without
sacrificing quality of results.

In any case, these are among the interesting points that came up during
the meeting. If I missed anything, please feel free to fill them in!

Thanks,
Ted

 --
Ted Pedersen
http://www.d.umn.edu/~tpederse