[text-similarity-users] Text Similarity version 0.06 released

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

We are pleased to announce the release of version 0.06 of Text-Similarity.
This is a module that WordNet-Similarity uses in the computation of
the lesk measure,  and one of the new features in this release is
providing a "lesk" score that does our calculation for "lesk overlap"
for any pair of files or strings you provide to it.

As you may recall  the lesk measure takes glosses and compares them for
overlaps (matches) and then scores them by taking the length of each phrasal
match, squaring it, and then summing those scores.

Consider the following example (line breaks introduced for clarity)
which measures the two given  strings for similarity:

 text_similarity.pl --type Text::Similarity::Overlaps --verbose
 --stoplist stoplist.txt --string
 'winston churchill was the prime minister of england'
 'prime minister of england winston churchill came for a visit that day'

 keys: 2
 -->'prime minister england' len(3) cnt(1)
 -->'winston churchill' len(2) cnt(1)
 wc 1: 5
 wc 2: 7
  Raw score: 5
  Precision: 0.714285714285714
  Recall   : 1
  F-measure: 0.833333333333333
  Dice     : 0.833333333333333
  E-measure: 0.166666666666667
  Cosine   : 0.845154254728517
  Raw lesk : 13
  Lesk     : 0.371428571428571
 0.833333333333333

We find two phrasal matches of length 2 and 3, so those are scored (by
raw lesk) as 2^2 + 3^2 = 13. That is  then scaled by the product of
the two string lengths to arrive at a  normalized lesk score. By
default WordNet
Similarity uses raw lesk. Note that the raw score is simply the number
of matching words (prime minister  england winston churchill) without regard to
their order, and that  this value is the basis of all the other measures
except for raw lesk and lesk. So, of the measures above, only lesk is
really considering phrasal matches and treats them differently.

This package provides both a command line program (text_similarity.pl)
and Perl API calls (examples in the  SYNOPSIS sections of the CPAN
documentation).

You can find more info and find download links at
http://text-similarity.sourceforge.net

I'm sure we'll continue to tinker with and extend Text Similarity, so
please do let us know of any suggestions you have.

Enjoy,
Ted

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse