From: Siddharth P. <si...@cs...> - 2008-04-10 04:20:09
|
Hi Ted, So... a getCompounds method could very easily be added to WordNet::Tools. Currently, the WordNet::Tools constructor (new()) builds a list of compounds internally for use with compoundify. This is different from how we did things before -- i.e., first generate a compounds.txt file using compounds.pl, and then use the compounds.txt in compoundify. The compoundify in WordNet::Tools simply generates a list of compounds from WordNet at startup, and then uses this list as its list of compounds. A new getCompounds method in WordNet::Tools could simply return this internal list, if required. I just wanted to point out that the "hash-code" for the different versions WordNet isn't really a standard. It was just something we (rather Ben Haskel) came up with, to generate an identifier for WordNet, from the WordNet data files. We just run an SHA1 hash function over the WordNet data file names and their sizes, to get this unique identifier. But someone could easily come up with a different way to generate a WordNet version identifier. Also, if it so happens that two different WordNet versions have data files with the exact same sizes, then they would get the same identifier. So, this method is not perfect. But I think it works, in general, since different versions of WordNet are unlikely to have the exact same file sizes. Anyway, if more and more people start using it, it could become a standard. But I guess for now maybe it would be better to refer to it as our internal WordNet version identifier, or something. -- Sid. On Wed, 2008-04-09 at 21:39 -0500, Ted Pedersen wrote: > Hi Sid, > > Yes, I think WordNet::Tools is terrific...there is in fact a kind of > interesting issue there - compoundify could even be viewed as WordNet > independent - it really just needs a list > of compounds from somewhere....and I think there are possibly some > issues like that > with the Freq.pl programs, not really compoundify issues, but > functionality that is primarily > text based and doesn't need WordNet, and indeed there is some redundancy between > those programs. Eliminating that redundancy has been on my list of things to do > for some time, and I think it would really be a nice enhancement to things... > > Anyway, the reason I was thinking about compoundify in a WordNet > independent sense > is that Text-Similarity wants to have a compounding operation > included, but it doesn't > currently have one (or the one it has doesn't seem to actually > work...) So..I don't > know if would make sense at all to think about a WordNet Tool that > just provided a > list of compounds and then a separate Text::Compoundify module...That actually > almost feels like a QueryData method....getCompounds or something....hmmm.... > > As to other WordNet functionality, I just added some constants for my > hash values > to refer to wordnet versions more conveniently - I was kind of wishing that > WordNet-QueryData would go ahead and do that conversion so that we could get > reliable values from version() again, and in fact that's what confused > me earlier today. > I had thought that was done but I don't think it was...so > anyway....that does seem > like an operation that users? (maybe developers) might end up doing - > figuring out > a table of hash to wordnet version values.... > > I wonder too, did we ever figure out if the hash values different on > Windows? I suppose > the must....so that's another possible point of failure, but...well, > one thing at a time. :) > > Otherwise, I think we've done a pretty good job of "exposing " the > functionalty of WordNet > Similarity so that people can get at some of the interesting functions > (like finding > hypernym trees, depths, etc.) , and i don't notice much duplication > any more except > as you say in some of the /utils... but, certainly worth thinking > about especially as > both SenseRelate and maybe even Text-Similarity start to grow up a bit > and make us > of different sorts of functionality.... > > Thanks! > Ted > > On Wed, Apr 9, 2008 at 9:17 PM, Siddharth Patwardhan <si...@cs...> wrote: > > > WordNet::Tools (a module included in WordNet::Similarity) is something > > > we will need to > > > exploit in WordNet::SenseRelate - it does two things that are > > > important for us there, > > > providing reliable version information, and then doing compoundify. We > > > do compoundify > > > in many different modules, but I think it makes sense to centralize it > > > it in one place, > > > and I think that place is WordNet::Tools... > > > > Right. That was the motivation behind creating WordNet::Tools... > > centralizing some common functions. Compoundify was present in many > > different modules and programs *within* WordNet::Similarity itself. > > And we updated the code to make it faster (twice I think). And each > > time we had to change all the different instances of the same function. > > So, we centralized it into WordNet::Tools. > > > > On that note, if you come across any other WordNet-specific function > > that can be centralized, you may want to consider putting it into > > WordNet::Tools. (Hmmm... now that I think about it, there is quite > > a bit of redundancy in the *Freq.pl programs... I wonder how much > > of that is WN-specific.) > > > > -- Sid. > > > > > > > > -- > Ted Pedersen > http://www.d.umn.edu/~tpederse > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > senserelate-developers mailing list > sen...@li... > https://lists.sourceforge.net/lists/listinfo/senserelate-developers |