Hi Yashar,
Thanks for your questions - see my responses inline...
On Tue, Jan 20, 2009 at 9:31 AM, Yashar Mehdad <yas...@ya...> wrote:
> Dear Ted
>
> I'm using your package in some of my experiments and in order to cite it I
> need few clarifications :
>
> while "text_similarity.pl --type=Text::Similarity::Overlaps file1 file2" is
> executed a normalized measure is obtained. what I understood from your
> documentation this measure is raw normalized (F-measure = 2 * precision *
> recall / (precision + recall)), is that right?
Correct. Consider the following example...
ted@ted-desktop:~$ text_similarity.pl test1 test2
--type=Text::Similarity::Overlaps --no-normalize
5
ted@ted-desktop:~$ text_similarity.pl test1 test2
--type=Text::Similarity::Overlaps
0.555555555555556
ted@ted-desktop:~$ more test1
this is test1 i am happy he is not
ted@ted-desktop:~$ more test2
this is test2 i am hungry she is sad
no-normalize for the second result shows that 5 words have matched
(without regard to order
or length of phrase)
>
> while "text_similarity.pl --type=Text::Similarity::Overlaps --no-normalize
> file1 file2" is executed, the output would be a simple raw score of overlap
> not lesk raw score? is it right?
Correct! There is no "bonus" for phrasal matching in the overlap scoring.
>
> is there any way in which by using text_similarity.pl an reach the lesk
> measure through defining any option? (im aware that by defining verbose
> option we could get all measures but is there any way that directly lead us
> to lesk measure).
Not from the command line, however, you could edit Overlaps.pm to just
output lesk....
Here's the relevant snippet, where I've added comments...
if ($self->verbose) {
# print " Raw score: $score\n";
# print " Precision: $prec\n";
# print " Recall : $recall\n";
# print " F-measure: $f\n";
# my $dice = 2 * $score / ($wc1 + $wc2) ;
# print " Dice : $dice\n";
# my $e = 1 - $f;
# print " E-measure: $e\n";
# my $cos = $score / sqrt ($wc1 * $wc2);
# print " Cosine : $cos\n";
my $lesk = $raw_lesk/ ($wc1 * $wc2);
# print " Raw lesk : $raw_lesk\n";
print " Lesk : $lesk\n";
I know that's a bit messy, but should be a fairly easy fix in the
short term at least...
I hope this helps!
Ted
>
> Thanks in advacne for your reply and help.
>
> Best regards
> Yashar.
>
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> text-similarity-users mailing list
> tex...@li...
> https://lists.sourceforge.net/lists/listinfo/text-similarity-users
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
|