From: Ted P. <tpederse@d.umn.edu> - 2008-10-25 16:02:05
|
Hi Karthick, I'm glad to know you are finding Text::Similarity useful... I think the main documentation we have about these measures is found here : http://search.cpan.org/dist/Text-Similarity/lib/Text/Similarity/Overlaps.pm This gives the formulas that we use in the program - I think in general these are pretty commonly accepted definitions (except perhaps for lesk) so we didn't elaborate a great deal on them. However, I'm happy to add some details as needed. The lesk measure in terms of the overlap counting, etc. that we do is probably best described here (in section 7.3): An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet (Banerjee and Pedersen) - Appears in the Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, February 17-23, 2002, Mexico City. http://www.d.umn.edu/~tpederse/Pubs/cicling2002-b.pdf The other measures I *think* are fairly standard, although if you have doubts about what we have done with them let me know and I can hopefully clarify. Thanks! Ted On Sat, Oct 25, 2008 at 10:45 AM, Karthick Jayaraman <kar...@gm...> wrote: > Dear Professor, > > I am using your Text:Similarity package in one my current projects. Is > there any documentation on the details of the metrics such as > F-Measure, Precision, Recall, Cosine, and Lesk ? Kindly let me know. > > We are currently using your package to do establish similarity of > JavaScript programs that undergo certain forms of minor minor dynamic > updatings. > > We would like to cite your package and the reference on the metrics. > > -- > Cheers!, > Karthick Jayaraman > > You must do the things you think you cannot do. > Eleanor Roosevelt > > http://web.syr.edu/~kjayaram > -- Ted Pedersen http://www.d.umn.edu/~tpederse |