From: Ted P. <dul...@gm...> - 2009-01-20 23:49:14
|
Hi Yashar, Thanks for your questions - see my responses inline... On Tue, Jan 20, 2009 at 9:31 AM, Yashar Mehdad <yas...@ya...> wrote: > Dear Ted > > I'm using your package in some of my experiments and in order to cite it I > need few clarifications : > > while "text_similarity.pl --type=Text::Similarity::Overlaps file1 file2" is > executed a normalized measure is obtained. what I understood from your > documentation this measure is raw normalized (F-measure = 2 * precision * > recall / (precision + recall)), is that right? Correct. Consider the following example... ted@ted-desktop:~$ text_similarity.pl test1 test2 --type=Text::Similarity::Overlaps --no-normalize 5 ted@ted-desktop:~$ text_similarity.pl test1 test2 --type=Text::Similarity::Overlaps 0.555555555555556 ted@ted-desktop:~$ more test1 this is test1 i am happy he is not ted@ted-desktop:~$ more test2 this is test2 i am hungry she is sad no-normalize for the second result shows that 5 words have matched (without regard to order or length of phrase) > > while "text_similarity.pl --type=Text::Similarity::Overlaps --no-normalize > file1 file2" is executed, the output would be a simple raw score of overlap > not lesk raw score? is it right? Correct! There is no "bonus" for phrasal matching in the overlap scoring. > > is there any way in which by using text_similarity.pl an reach the lesk > measure through defining any option? (im aware that by defining verbose > option we could get all measures but is there any way that directly lead us > to lesk measure). Not from the command line, however, you could edit Overlaps.pm to just output lesk.... Here's the relevant snippet, where I've added comments... if ($self->verbose) { # print " Raw score: $score\n"; # print " Precision: $prec\n"; # print " Recall : $recall\n"; # print " F-measure: $f\n"; # my $dice = 2 * $score / ($wc1 + $wc2) ; # print " Dice : $dice\n"; # my $e = 1 - $f; # print " E-measure: $e\n"; # my $cos = $score / sqrt ($wc1 * $wc2); # print " Cosine : $cos\n"; my $lesk = $raw_lesk/ ($wc1 * $wc2); # print " Raw lesk : $raw_lesk\n"; print " Lesk : $lesk\n"; I know that's a bit messy, but should be a fairly easy fix in the short term at least... I hope this helps! Ted > > Thanks in advacne for your reply and help. > > Best regards > Yashar. > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > text-similarity-users mailing list > tex...@li... > https://lists.sourceforge.net/lists/listinfo/text-similarity-users > > -- Ted Pedersen http://www.d.umn.edu/~tpederse |