Re: [text-similarity-users] Question on your Text::Similarity package

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Karthick,

I'm glad to know you are finding Text::Similarity useful...

I think the main documentation we have about these measures is found here :

http://search.cpan.org/dist/Text-Similarity/lib/Text/Similarity/Overlaps.pm

This gives the formulas that we use in the program - I think in
general these are pretty commonly accepted definitions (except perhaps
for lesk) so we didn't elaborate a great deal on them. However, I'm
happy to add some details as needed.

The lesk measure in terms of the overlap counting, etc. that we do is
probably best described here (in section 7.3):

An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet
(Banerjee and Pedersen) - Appears in the Proceedings of the Third
International Conference on Intelligent Text Processing and
Computational Linguistics, pp. 136-145, February 17-23, 2002, Mexico
City.
http://www.d.umn.edu/~tpederse/Pubs/cicling2002-b.pdf

The other measures I *think* are fairly standard, although if you have
doubts about what we have done with them let me know and I can
hopefully clarify.

Thanks!
Ted

On Sat, Oct 25, 2008 at 10:45 AM, Karthick Jayaraman
<kar...@gm...> wrote:
> Dear Professor,
>
> I am using your Text:Similarity package in one my current projects. Is
> there any documentation on the details of the metrics such as
> F-Measure, Precision, Recall, Cosine, and Lesk ? Kindly let me know.
>
> We are currently using your package to do establish similarity of
> JavaScript programs that undergo certain forms of minor minor dynamic
> updatings.
>
> We would like to cite your package and the reference on the metrics.
>
> --
> Cheers!,
> Karthick Jayaraman
>
> You must do the things you think you cannot do.
> Eleanor Roosevelt
>
> http://web.syr.edu/~kjayaram
>

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse