From: ted p. <tpederse@d.umn.edu> - 2004-11-19 00:50:40
|
Hi Jason, There were a few issues that came up in our Nov 18 meeting that I wanted to make sure we remembered as they seem to directly impact how we'll proceed with WordNet-SenseRelate. First, we discussed the thresholding that we are performing for the word sense disambiguation measure, and agreed that we really need it at two levels. First, there is the level of "pairwise" comparisons, which seems to be what was in the pseudo-code in the presentation. This threshold avoids adding very small pairwise relatedness values into the overall relatedness score. We also need a threshold that checks the overall relatedness value, and makes sure it is large enough to merit making a decision in favor of a particular sense. So, these are two distinct thresholds, and we probably need different options for them. In fact, these two options both appear to be the in the current version of SenseRelate/disamb.pl. Here are their help entries: --minScore SCORE The score of relatedness with the context, of any sense of the target word, must be at least above SCORE for that sense to be even considered a candidate for the answer. --scoreThreshold SCORE For given sense of target word, ignore the score from a particular non-target word if it is <= SCORE. Default = 0. I would actually propose that we rename these as --contextScore --pairScore hopefully those name are descriptive enough to convey which is which. :) Second, we agreed that it would be very interesting to conduct an experiment which compares lesk with jcn using pos/coercion. The issue or problem for lesk is how long it takes, while jcn is limited to only dealing with nouns (or verbs) exclusively. However, if we are able to "coerce" words into the part of speech of the target word, we may be able to use jcn and get just as good results as lesk in less time. Third, the new derivational relations in Wordnet 2.0 may be helpful in coercing to the desired part of speech. Fourth, Bano pointed out that in his MS thesis there is a table (based on WordNet 1.7) that shows that of the 56,000 compounds known to WordNet, 56,000 of them only have one sense. Given this, it seems that pos tagging of compounds is not too significant an issue, since the sense disambiguation problem itself isn't too significant (there is only one to choose from). Therefore a simple hueristic for tagging compounds as nouns may be reasonable. We want to verify that the situation remains like this in 2.0 however. Fifth, and finally, Bano suggested that we may want to consider using jcn where we can, and then only using lesk when we need the added ability to look at different parts of speech. This seems like an excellent suggestion, and might represent a way to improve performance without sacrificing quality of results. In any case, these are among the interesting points that came up during the meeting. If I missed anything, please feel free to fill them in! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |