|
From: Ted P. <tpederse@d.umn.edu> - 2011-01-02 00:11:36
|
Hi Sean,
I'm happy to report I think I figured this out. I use American
spellings. :) The option is actually "normalize", whereas you were
using "normalise" which I guess was just getting ignored (and we
aren't apparently taking action when an invalid option is specified,
which is a concern.) I think when you make this change things will
work out more as you expect.
ted@linux-qdw9:~> cat ts3.pl
my $str1 = "the dog bit Jim";
my $str2 = "jim bit the dog ";
my $laptool = "Text::Similarity::Overlaps";
eval "require $laptool";
if ($@) {die "\nWARNING ! $tool not loaded ..\n\n";}
my %lapopts = ('normalize' => 1, 'verbose' => 1); # 'verbose'
=++lesk-score
my $mod = $laptool->new(\%lapopts);
unless (defined($mod)) {print "FAILED '$laptool'\n"; return 0;}
$score = $mod->getSimilarityStrings ($str1, $str2);
print "score= $score\n\n";
ted@linux-qdw9:~> perl ts3.pl
keys: 3
-->'bit' len(1) cnt(1)
-->'jim' len(1) cnt(1)
-->'the dog' len(2) cnt(1)
wc 1: 4
wc 2: 4
Raw score: 4
Precision: 1
Recall : 1
F-measure: 1
Dice : 1
E-measure: 0
Cosine : 1
Raw lesk : 6
Lesk : 0.375
score= 1
ted@linux-qdw9:~> cat ts4.pl
my $str1 = "the dog bit Jim";
my $str2 = "jim bit the dog ";
my $laptool = "Text::Similarity::Overlaps";
eval "require $laptool";
if ($@) {die "\nWARNING ! $tool not loaded ..\n\n";}
my %lapopts = ('normalize' => 0, 'verbose' => 1); # 'verbose' =
++lesk-score
my $mod = $laptool->new(\%lapopts);
unless (defined($mod)) {print "FAILED '$laptool'\n"; return 0;}
$score = $mod->getSimilarityStrings ($str1, $str2);
print "score= $score\n\n";
ted@linux-qdw9:~> perl ts4.pl
keys: 3
-->'bit' len(1) cnt(1)
-->'jim' len(1) cnt(1)
-->'the dog' len(2) cnt(1)
score= 4
BTW, I very much agree with your suggestions for some methods to
return particular values of scores. I'll see if we can't do something
about that in the next few months, as others have made a similar point
(as you point out).
Cordially,
Ted
On Fri, Dec 31, 2010 at 11:41 PM, Sean <so...@or...> wrote:
> Sure, Ted, here it is:
>
> #-------------------------------------------- CODE
> ----------------------------------------------
> my $str1 = "the dog bit Jim";
> my $str2 = "jim bit the dog ";
> my $laptool = "Text::Similarity::Overlaps";
> eval "require $laptool";
> if ($@) {die "\nWARNING ! $tool not loaded ..\n\n";}
> #my %lapopts = ('normalise' => 0, 'verbose' => 1); # 'verbose' =
> ++lesk-score
> my %lapopts = ('normalise' => 1, 'verbose' => 1); # 'verbose' =
> ++lesk-score
> my $mod = $laptool->new(\%lapopts);
> unless (defined($mod)) {print "FAILED '$laptool'\n"; return 0;}
> $score = $mod->getSimilarityStrings ($str1, $str2);
> print "score= $score\n\n";
> #---------------------------------------END CODE
> ----------------------------------------------
>
> My guess is that self->verbose is not actually getting properly set via the
> options?
>
> regards
> Sean
>
>
> Ted Pedersen wrote:
>
> Hi Sean,
>
> Thanks for your suggestions, let me take a look at those and see what
> we might be able to do.
>
> And I'm sorry you are having some troubles. Can you go ahead and post
> whatever code you are running to get these results? That will make it
> easier to recreate the output.
>
> Cordially,
> Ted
>
> On Fri, Dec 31, 2010 at 7:33 PM, Sean <so...@or...> wrote:
>
>
> Hello Ted
>
> I have installed v-0.08 and do not seem to get the results as documented.
>
> In order to get the Lesk-score I have tried setting the options 2
> different ways, both without luck.
> 1. ('normalise' => 0, 'verbose' => 1); # I expected this to work, not
> wanting the Lesk normalised ...
> 2. ('normalise' => 1, 'verbose' => 1); # also tried this just in case ...
>
> The COMPLETE screen-printed output from BOTH (using your doc example) is:
> "keys: 3 -->'bit' len(1) cnt(1) -->'jim' len(1) cnt(1)
> -->'the dog' len(2) cnt(1)"
>
> This is using the getSimilarityStrings ($str1, $str2) function directly
> from another script ..(getting a score of 4 returned there as expected).
>
> While at it I may as well mention what tops my wish-list for v-0.09. I
> would like to see additional simple wrapper functions like getLesk(),
> getCosine() which would return just the single measure specified, and
> getAll() which would return a hashref of 'named'-parameters to include
> all provided measures, and which the other functions would be simple
> wrappers around to pull out one or other from that comprehensive
> getAll()-hashref?
>
> This would avoid having to capture & parse output from stdout/stderr or
> some other arbitrary output channel, although it would probably do no
> harm to also "print" those measures. Since adding string (rather than
> file) acceptance obviously came as an afterthought itself, this might be
> the next logical extension to functionality. Looking at previous mailers
> I thought I detected similar requests, though expressed somewhat
> differently.
>
> Keep up the good work in 2011.
>
> Sean
>
>
>
> ------------------------------------------------------------------------------
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment,
> and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> text-similarity-users mailing list
> tex...@li...
> https://lists.sourceforge.net/lists/listinfo/text-similarity-users
>
>
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
|