You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: Ted P. <dul...@gm...> - 2015-10-08 00:54:29
|
We are pleased to announce the release of version 0.11 of Text::Similarity. This includes a few fixes and corrections supplied by users (which we are always most grateful for!). You can download the new version from CPAN or sourceforge via links found at http://text-similarity.sourceforge.net. Below is the change log for this release. Finally, we are very open to other patches or ideas that users have, so please feel free to let us know! 0.11 Released October 6, 2015 (all changes by TDP) Contributed enhancement by Tani Hosokawa Not a bug, but an optimization. Original version does inefficient repeated linear search over text that can't possibly match. Instead, precaches locations of keywords. Comparing 100 semi-randomly generated fairly similar documents of about 500 words each results in approx 90% speed increase, the efficiency increases as the documents get larger. https://rt.cpan.org/Public/Ticket/Attachment/999948/520850 Make various documentation/typo fixes as suggested by Alex Becker. Found in CPAN bug list. Enjoy, Ted |
From: Ted P. <tpederse@d.umn.edu> - 2013-06-27 12:17:08
|
We are pleased to announce the release of version 0.10 of Text-Similarity. This release only includes a single fix, and that is a change to a test case that fails on Windows. Unless this sort of thing really bothers you, you probably don't need to update. :) You can find the most current version on CPAN or at sourceforge: http://text-similarity.sourceforge.net However, there is a more important announcement, and that is that as of 0.10 Text-Similarity is again current in our sourceforge cvs archive. There were some transitions happening at sourceforge when 0.09 came out, so we did not use cvs. But, we are back to using cvs now, and that is always available for viewing or modifying if you are interested. Note that the cvs module name is now TS. As of now the web view hasn't been updated to include this new directory, but that should occur in the next day or two. Additional instructions on using cvs are available in sourceforge: http://sourceforge.net/p/text-similarity/code/?source=navbar Enjoy, and please let us know if any questions arise. Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <tpederse@d.umn.edu> - 2013-01-22 21:01:04
|
Version 0.09 of Text::Similarity has been released on CPAN and sourceforge. This release includes two user contributions (that are very much appreciated). See details below, and feel free to download from http://text-similarity.sourceforge.net 0.09 Released January 22, 2013 * This release includes changes contributed by Myroslava Dzikovska that provide the full set of similarity scores programmatically. She modified the interface so that the getSimilarity function returns a pair ($score, %allScores) where %allScores is a hash of all possible scores that it computes. She made it so that in scalar context it will only return $score, so it is fully backwards compatible with the older versions. She also changed the printing to STDERR, to make it easier to use the code in filter scripts that depend on STDIN/STDOUT. * This release also inludes changes ontributed by Nathan Glen to allow test cases to pass on Windows. The single quote used previously caused arguments to the script not to be passed corrected, leading to test failures. The single quotes have been changed to double quotes. Enjoy, Ted |
From: Ted P. <tpederse@d.umn.edu> - 2010-06-13 15:55:13
|
We are pleased to announce the release of version 0.08 of Text-Similarity. This versions one important change - when you are using a stoplist, you can now specify stop words using regular expressions. In previous versions a stoplist can be specified as follows (in a single file, one line per word) a of in This will cause a, of and in to be treated as stop words (and not use them in computing similarity). As of 0.08 you may continue to use the above format, or you can use regular expressions... For example... /\b\w\b/ /\b\d+\b/ ...would cause all single character words and numeric values to be removed... You can get this new version via CPAN or sourceforge - find links to both at : http://text-similarity.sourceforge.net Enjoy, Ted and Ying -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <dul...@gm...> - 2008-09-24 03:08:02
|
---------- Forwarded message ---------- From: CPAN Tester Report Server <do_...@cp...> Date: Tue, Sep 23, 2008 at 10:47 PM Subject: CPAN Testers Daily Report To: Ted Pedersen <TPE...@cp...> Dear Ted Pedersen, CPAN Testers Notifications have changed. This mail now comes from a centralised server, and authors should no longer be receiving reports directly from testers. If you do receive reports, please ask the tester in question to update their version of Test-Reporter, which now disables the CCing to authors. Thanks. Please find below the latest reports for your distributions, generated by CPAN Testers, from the last 24 hours. Currently only FAIL reports are listed, with only the first instance of a report for a distribution on a particular platform, using a specific version of Perl. As such you may find further similar reports at http://www.cpantesters.org. Text-Similarity-0.06: - MSWin32-x86-multi-thread / 5.10.0: - FAIL http://nntp.x.perl.org/group/perl.cpan.testers/2282033 This mail is generated by an automated system. If you do not wish to receive these mails, please contact Barbie <ba...@cp...> and request to be removed from the automatic mailings. If you have an issue with a particular report, or wish to gain further information from the tester, please use the 'Find A Tester' tool at http://stats.cpantesters.org/cpanmail.html, using the NNTP ID of the report to locate the correct email address. Thanks, The CPAN Testers -- Reports: http://www.cpantesters.org -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <dul...@gm...> - 2008-04-11 15:55:46
|
Hi Sid, This looks great, and should actually be very helpful for both Text-Similarity and SenseRelate, since both have compoundify operations. I think having a new release of WordNet-Similarity with this and the other changes you have in the cooker is a great idea. I was thinking of making some small changes to the documentation in our /util programs and the web interface programs, mostly so that they look a little better on CPAN (that is cleaning up the NAME entries, things like that...) So I will tinker around with that this morning, I'm sure it won't be very substantial nor will it take very much time, then perhaps we can release thereafter.... Thanks! Ted On Fri, Apr 11, 2008 at 4:43 AM, Siddharth Patwardhan <si...@cs...> wrote: > Hi Ted, > > > Ah, very interesting. I didn't realize this was how things were > > structured now, > > but it makes good sense. I think that compounds.pl program is very > > neat, and having a getCompounds method would actually be potentially > > very useful for users. I think it's a natural enough question to ask - > > that is, what are the compounds in WordNet...so having that as a part > > of a Tools package makes good sense to me. > > > > I think what Text::Similarity needs is probably independent of WordNet > > - that is it really just needs that string matching logic used in > > compoundify - given a list of compounds find them in a given text - so > > in that case a getCompounds method would be very handy (if we wanted > > to find WordNet compounds) or the user could provide their own list > > from some other source and then match in about the same way. The > > matching logic is already in Text-Similarity and in fact it might work > > as it is, I haven't looked at that too deeply as yet... > > > > So, anyway, I do think a getCompounds method in WordNet::Tools could > > be very useful for those modules like Text-Similarity that might like > > to go looking for WordNet compounds. Probably we wouldn't want to > > build in a dependence on WordNet-Similarity though, so we'd just run > > that once and then provide the compounds to Text-Similarity. Having > > that list in a "Perl form" would be nice, as that would make it easy > > to send into Text-Similarity... > > I just added a method getCompoundsList() to WordNet::Tools and committed > it to CVS. A simple program that mimics compounds.pl, using this new > method, will look like this: > > #! /usr/bin/perl > > use WordNet::QueryData; > use WordNet::Tools; > > my $wn = WordNet::QueryData->new(); > die "Error: Unable to create WordNet::QueryData object.\n" > if(!defined($wn)); > > my $wntools = WordNet::Tools->new($wn); > die "Error: Unable to create WordNet::Tools object.\n" > if(!defined($wntools)); > > my $arref = $wntools->getCompoundsList(); > die "Error: No list returned.\n" if(!defined($arref)); > > foreach my $key (@{$arref}) > { > print "$key\n"; > } > > > I guess, this new method will become available with the next release of > WordNet-Similarity, which can be pretty soon. > > Thanks. > > -- Sid. > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse <http://www.d.umn.edu/%7Etpederse> |
From: Ted P. <dul...@gm...> - 2008-04-10 17:49:03
|
Hi Sid, See comments below... On Wed, Apr 9, 2008 at 11:19 PM, Siddharth Patwardhan <si...@cs...> wrote: > Hi Ted, > > So... a getCompounds method could very easily be added to > WordNet::Tools. Currently, the WordNet::Tools constructor (new()) > builds a list of compounds internally for use with compoundify. > This is different from how we did things before -- i.e., first > generate a compounds.txt file using compounds.pl, and then > use the compounds.txt in compoundify. The compoundify in > WordNet::Tools simply generates a list of compounds from WordNet > at startup, and then uses this list as its list of compounds. > A new getCompounds method in WordNet::Tools could simply return this > internal list, if required. Ah, very interesting. I didn't realize this was how things were structured now, but it makes good sense. I think that compounds.pl program is very neat, and having a getCompounds method would actually be potentially very useful for users. I think it's a natural enough question to ask - that is, what are the compounds in WordNet...so having that as a part of a Tools package makes good sense to me. I think what Text::Similarity needs is probably independent of WordNet - that is it really just needs that string matching logic used in compoundify - given a list of compounds find them in a given text - so in that case a getCompounds method would be very handy (if we wanted to find WordNet compounds) or the user could provide their own list from some other source and then match in about the same way. The matching logic is already in Text-Similarity and in fact it might work as it is, I haven't looked at that too deeply as yet... So, anyway, I do think a getCompounds method in WordNet::Tools could be very useful for those modules like Text-Similarity that might like to go looking for WordNet compounds. Probably we wouldn't want to build in a dependence on WordNet-Similarity though, so we'd just run that once and then provide the compounds to Text-Similarity. Having that list in a "Perl form" would be nice, as that would make it easy to send into Text-Similarity... I just wanted to point out that the "hash-code" for the different > versions WordNet isn't really a standard. It was just something we > (rather Ben Haskel) came up with, to generate an identifier for > WordNet, from the WordNet data files. We just run an SHA1 hash function > over the WordNet data file names and their sizes, to get this unique > identifier. But someone could easily come up with a different way to > generate a WordNet version identifier. Also, if it so happens that two > different WordNet versions have data files with the exact same sizes, > then they would get the same identifier. So, this method is not perfect. > But I think it works, in general, since different versions of WordNet > are unlikely to have the exact same file sizes. Thanks for clarifying this - I do think the SHA1 idea *should* provide unique identifiers, and in fact I think it might even be overly unique, in that a Windows 2.0 and a Unix 2.0 should have different values (I assume there must be some formatting differences that cause them to be rather different). But, I actually think that is good, in that it would make it possible to identify the exact WordNet version being used. But I do agree, we'll want to be on the alert for a WordNet that somehow has the same SHA1 values as another version. It doesn't seem likely, unless WordNet were to release a version that differed only in respect to documentation and not the data files, but that doesn't seem to be their style. Anyway, if more and more people start using it, it could become a > standard. But I guess for now maybe it would be better to refer to it > as our internal WordNet version identifier, or something. Agreed - best to make it clear we are the ones producing that hash, and not potentially confuse WordNet users who then expect that elsewhere. Thanks! Ted > > -- Sid. > > On Wed, 2008-04-09 at 21:39 -0500, Ted Pedersen wrote: > > Hi Sid, > > > > Yes, I think WordNet::Tools is terrific...there is in fact a kind of > > interesting issue there - compoundify could even be viewed as WordNet > > independent - it really just needs a list > > of compounds from somewhere....and I think there are possibly some > > issues like that > > with the Freq.pl programs, not really compoundify issues, but > > functionality that is primarily > > text based and doesn't need WordNet, and indeed there is some redundancy > between > > those programs. Eliminating that redundancy has been on my list of > things to do > > for some time, and I think it would really be a nice enhancement to > things... > > > > Anyway, the reason I was thinking about compoundify in a WordNet > > independent sense > > is that Text-Similarity wants to have a compounding operation > > included, but it doesn't > > currently have one (or the one it has doesn't seem to actually > > work...) So..I don't > > know if would make sense at all to think about a WordNet Tool that > > just provided a > > list of compounds and then a separate Text::Compoundify module...That > actually > > almost feels like a QueryData method....getCompounds or > something....hmmm.... > > > > As to other WordNet functionality, I just added some constants for my > > hash values > > to refer to wordnet versions more conveniently - I was kind of wishing > that > > WordNet-QueryData would go ahead and do that conversion so that we could > get > > reliable values from version() again, and in fact that's what confused > > me earlier today. > > I had thought that was done but I don't think it was...so > > anyway....that does seem > > like an operation that users? (maybe developers) might end up doing - > > figuring out > > a table of hash to wordnet version values.... > > > > I wonder too, did we ever figure out if the hash values different on > > Windows? I suppose > > the must....so that's another possible point of failure, but...well, > > one thing at a time. :) > > > > Otherwise, I think we've done a pretty good job of "exposing " the > > functionalty of WordNet > > Similarity so that people can get at some of the interesting functions > > (like finding > > hypernym trees, depths, etc.) , and i don't notice much duplication > > any more except > > as you say in some of the /utils... but, certainly worth thinking > > about especially as > > both SenseRelate and maybe even Text-Similarity start to grow up a bit > > and make us > > of different sorts of functionality.... > > > > Thanks! > > Ted > > > > On Wed, Apr 9, 2008 at 9:17 PM, Siddharth Patwardhan <si...@cs...> > wrote: > > > > WordNet::Tools (a module included in WordNet::Similarity) is > something > > > > we will need to > > > > exploit in WordNet::SenseRelate - it does two things that are > > > > important for us there, > > > > providing reliable version information, and then doing compoundify. > We > > > > do compoundify > > > > in many different modules, but I think it makes sense to centralize > it > > > > it in one place, > > > > and I think that place is WordNet::Tools... > > > > > > Right. That was the motivation behind creating WordNet::Tools... > > > centralizing some common functions. Compoundify was present in many > > > different modules and programs *within* WordNet::Similarity itself. > > > And we updated the code to make it faster (twice I think). And each > > > time we had to change all the different instances of the same > function. > > > So, we centralized it into WordNet::Tools. > > > > > > On that note, if you come across any other WordNet-specific function > > > that can be centralized, you may want to consider putting it into > > > WordNet::Tools. (Hmmm... now that I think about it, there is quite > > > a bit of redundancy in the *Freq.pl programs... I wonder how much > > > of that is WN-specific.) > > > > > > -- Sid. > > > > > > > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse <http://www.d.umn.edu/%7Etpederse> > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > senserelate-developers mailing list > > sen...@li... > > https://lists.sourceforge.net/lists/listinfo/senserelate-developers > > -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Ted P. <tpederse@d.umn.edu> - 2008-04-04 19:34:43
|
Hi Sid, I just released a version 0.05 of text-similarity, to address the issue of being able to compare strings in addition to files. The major change was adding a getSimilarityStrings method, which more or less turns getSimilarity into a file processing front end to it - so the string processing functionality was of course already in getSimilarity, it was just not really exposed because of the file input, so I split them apart more or less, and now a user can input strings to getSimilarityStrings, or files to getSimilarity. I think WordNet-Similarity is unaffected by all this, and even if it was using getSimilarity that functionality is still the same. Also modified text_compare.pl to have a --string option so that a user can input strings from the command line. So, I guess the plan to give greater visibility to text-similarity is working, although it does lead to more work. :) Let me know if you see anything amiss with this! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse |
From: Jason R M. <mich0212@d.umn.edu> - 2004-11-12 21:53:35
|
There are 44 compounds that are both nouns and verbs in WordNet. I think you're right that compounds are more likely to be nouns than verbs. FYI, here are the 44 compounds: split_up water_ski contra_danse freak_out roller_blade single_crochet ice_skate bar_mitzvah push_back deep_freeze gold_plate ski_jump single_stitch turn_around letter_bomb black_marketeer shell_stitch nolle_prosequi slam_dance get_together roller_skate scotch_tape test_drive write_up speed_skate cave_in black_market tap_dance strip_mine bat_mitzvah call_up machine_gun break_dance folk_dance square_dance double_crochet mop_up goose_step purl_stitch double_cross kick_up roll_in_the_hay belly_dance double_stitch ted pedersen wrote: > Hi Jason, > > This is really interesting. I hadn't thought of this before, but > I see exactly what you are referring to. > > I'd suggest the following - my experience has been that compounds > are very often nouns (not always, but more often than they are > verbs). So if we can't do any better, I'd suggest assuming that > a compound is a noun. > > Actually, now that I think of it - I wonder if there are many compounds > that are both nouns and verbs (at least those known to WordNet)? I would > doubt it. Would that be a useful fact? > > I'll think about this some more... > > Thanks! > Ted |
From: ted p. <tpederse@d.umn.edu> - 2004-11-12 21:37:35
|
Hi Jason, This is really interesting. I hadn't thought of this before, but I see exactly what you are referring to. I'd suggest the following - my experience has been that compounds are very often nouns (not always, but more often than they are verbs). So if we can't do any better, I'd suggest assuming that a compound is a noun. Actually, now that I think of it - I wonder if there are many compounds that are both nouns and verbs (at least those known to WordNet)? I would doubt it. Would that be a useful fact? I'll think about this some more... Thanks! Ted On Fri, 12 Nov 2004, Jason Michelizzi wrote: > I've come across a slight difficulty in working with compoundifying > and converting POS tags from the Penn Treebank format to WN format. > If we do compoundification on tagged words, it seems that we have to > discard the POS tags. The problem is that there are compound words > that belong to more than one part of speech, such as machine_gun and > goose_step (both of them can be either nouns or verbs). > > So if we came across text such as "goose/NN step/NN" or "machine/NN > gun/NN", we could only turn that into "goose_step" and "machine_gun", > but not "goose_step#n" or "machine_gun#v". (The fact that step is > tagged as a noun isn't much of a help, the Brill tagger always seems > to tag it as a noun in the few experiments I tried, except when I had > "stepped or stepping" instead). > > Jason > -- Ted Pedersen http://www.d.umn.edu/~tpederse |