You can subscribe to this list here.
| 2008 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: Ted P. <dul...@gm...> - 2015-10-08 00:54:29
|
We are pleased to announce the release of version 0.11 of Text::Similarity. This includes a few fixes and corrections supplied by users (which we are always most grateful for!). You can download the new version from CPAN or sourceforge via links found at http://text-similarity.sourceforge.net. Below is the change log for this release. Finally, we are very open to other patches or ideas that users have, so please feel free to let us know! 0.11 Released October 6, 2015 (all changes by TDP) Contributed enhancement by Tani Hosokawa Not a bug, but an optimization. Original version does inefficient repeated linear search over text that can't possibly match. Instead, precaches locations of keywords. Comparing 100 semi-randomly generated fairly similar documents of about 500 words each results in approx 90% speed increase, the efficiency increases as the documents get larger. https://rt.cpan.org/Public/Ticket/Attachment/999948/520850 Make various documentation/typo fixes as suggested by Alex Becker. Found in CPAN bug list. Enjoy, Ted |
|
From: Ted P. <tpederse@d.umn.edu> - 2013-06-27 12:17:09
|
We are pleased to announce the release of version 0.10 of Text-Similarity. This release only includes a single fix, and that is a change to a test case that fails on Windows. Unless this sort of thing really bothers you, you probably don't need to update. :) You can find the most current version on CPAN or at sourceforge: http://text-similarity.sourceforge.net However, there is a more important announcement, and that is that as of 0.10 Text-Similarity is again current in our sourceforge cvs archive. There were some transitions happening at sourceforge when 0.09 came out, so we did not use cvs. But, we are back to using cvs now, and that is always available for viewing or modifying if you are interested. Note that the cvs module name is now TS. As of now the web view hasn't been updated to include this new directory, but that should occur in the next day or two. Additional instructions on using cvs are available in sourceforge: http://sourceforge.net/p/text-similarity/code/?source=navbar Enjoy, and please let us know if any questions arise. Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <tpederse@d.umn.edu> - 2013-01-22 21:01:05
|
Version 0.09 of Text::Similarity has been released on CPAN and sourceforge. This release includes two user contributions (that are very much appreciated). See details below, and feel free to download from http://text-similarity.sourceforge.net 0.09 Released January 22, 2013 * This release includes changes contributed by Myroslava Dzikovska that provide the full set of similarity scores programmatically. She modified the interface so that the getSimilarity function returns a pair ($score, %allScores) where %allScores is a hash of all possible scores that it computes. She made it so that in scalar context it will only return $score, so it is fully backwards compatible with the older versions. She also changed the printing to STDERR, to make it easier to use the code in filter scripts that depend on STDIN/STDOUT. * This release also inludes changes ontributed by Nathan Glen to allow test cases to pass on Windows. The single quote used previously caused arguments to the script not to be passed corrected, leading to test failures. The single quotes have been changed to double quotes. Enjoy, Ted |
|
From: Ted P. <tpederse@d.umn.edu> - 2010-06-13 15:55:13
|
We are pleased to announce the release of version 0.08 of Text-Similarity. This versions one important change - when you are using a stoplist, you can now specify stop words using regular expressions. In previous versions a stoplist can be specified as follows (in a single file, one line per word) a of in This will cause a, of and in to be treated as stop words (and not use them in computing similarity). As of 0.08 you may continue to use the above format, or you can use regular expressions... For example... /\b\w\b/ /\b\d+\b/ ...would cause all single character words and numeric values to be removed... You can get this new version via CPAN or sourceforge - find links to both at : http://text-similarity.sourceforge.net Enjoy, Ted and Ying -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2008-11-16 00:04:23
|
We are pleased to announce the release of version 0.07 of Text-Similarity. This release has a single fix to a test case that has caused trouble for Windows installation, so you should only worry about upgrading if you are using Windows, or if you are using a version less than 0.06 (which had a number of significant changes). You can find download links from CPAN and sourceforge at http://text-similarity.sourceforge.net Please let us know if you have any questions or concerns! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2008-04-06 14:50:09
|
We are pleased to announce the release of version 0.06 of Text-Similarity. This is a module that WordNet-Similarity uses in the computation of the lesk measure, and one of the new features in this release is providing a "lesk" score that does our calculation for "lesk overlap" for any pair of files or strings you provide to it. As you may recall the lesk measure takes glosses and compares them for overlaps (matches) and then scores them by taking the length of each phrasal match, squaring it, and then summing those scores. Consider the following example (line breaks introduced for clarity) which measures the two given strings for similarity: text_similarity.pl --type Text::Similarity::Overlaps --verbose --stoplist stoplist.txt --string 'winston churchill was the prime minister of england' 'prime minister of england winston churchill came for a visit that day' keys: 2 -->'prime minister england' len(3) cnt(1) -->'winston churchill' len(2) cnt(1) wc 1: 5 wc 2: 7 Raw score: 5 Precision: 0.714285714285714 Recall : 1 F-measure: 0.833333333333333 Dice : 0.833333333333333 E-measure: 0.166666666666667 Cosine : 0.845154254728517 Raw lesk : 13 Lesk : 0.371428571428571 0.833333333333333 We find two phrasal matches of length 2 and 3, so those are scored (by raw lesk) as 2^2 + 3^2 = 13. That is then scaled by the product of the two string lengths to arrive at a normalized lesk score. By default WordNet Similarity uses raw lesk. Note that the raw score is simply the number of matching words (prime minister england winston churchill) without regard to their order, and that this value is the basis of all the other measures except for raw lesk and lesk. So, of the measures above, only lesk is really considering phrasal matches and treats them differently. This package provides both a command line program (text_similarity.pl) and Perl API calls (examples in the SYNOPSIS sections of the CPAN documentation). You can find more info and find download links at http://text-similarity.sourceforge.net I'm sure we'll continue to tinker with and extend Text Similarity, so please do let us know of any suggestions you have. Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2008-04-06 14:40:35
|
We are pleased to announce the release of version 0.06 of Text-Similarity. This is a module that WordNet-Similarity uses in the computation of the lesk measure, and one of the new features in this release is providing a "lesk" score that does our calculation for "lesk overlap" for any pair of files or strings you provide to it. As you may recall the lesk measure takes glosses and compares them for overlaps (matches) and then scores them by taking the length of each phrasal match, squaring it, and then summing those scores. Consider the following example (line breaks introduced for clarity) which measures the two given strings for similarity: text_similarity.pl --type Text::Similarity::Overlaps --verbose --stoplist stoplist.txt --string 'winston churchill was the prime minister of england' 'prime minister of england winston churchill came for a visit that day' keys: 2 -->'prime minister england' len(3) cnt(1) -->'winston churchill' len(2) cnt(1) wc 1: 5 wc 2: 7 Raw score: 5 Precision: 0.714285714285714 Recall : 1 F-measure: 0.833333333333333 Dice : 0.833333333333333 E-measure: 0.166666666666667 Cosine : 0.845154254728517 Raw lesk : 13 Lesk : 0.371428571428571 0.833333333333333 We see two phrasal matches of length 2 and 3, so those are scored (by raw lesk) as 2^2 + 3^2 = 13. That is then scaled by the product of the two string lengths to arrive at a normalized lesk score. By default WordNet Similarity uses raw lesk. Note that the raw score is simply the number of matching words (prime minister england winston churchill) without regard to their order, and that this value is the basis of all the other measures except for raw lesk and lesk. So, of the measures above, only lesk is really considering phrasal matches and treating them differently. This package provides both a command line program (text_similarity.pl) and Perl API calls (examples in the SYNOPSIS sections of the CPAN documentation). You can find more info and find download links at http://text-similarity.sourceforge.net I'm sure we'll continue to tinker with and extend Text Similarity, so please do let us know of any suggestions you have. Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <dul...@gm...> - 2008-04-04 19:29:39
|
We are pleased to announce the release of Text-Similarity version 0.05. This version allows users to measure two strings for similarity, in addition to being able to measure two files (which was the existing functionality). http://text-similarity.sourceforge.net Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
|
From: Ted P. <tpederse@d.umn.edu> - 2008-04-04 19:16:35
|
We are pleased to announce the release of Text-Similarity version 0.05. This version allows users to measure two strings for similarity, in addition to being able to measure two files (which was the existing functionality). http://text-similarity.sourceforge.net Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |