From: Ted P. <dul...@gm...> - 2008-04-10 17:49:03
|
Hi Sid, See comments below... On Wed, Apr 9, 2008 at 11:19 PM, Siddharth Patwardhan <si...@cs...> wrote: > Hi Ted, > > So... a getCompounds method could very easily be added to > WordNet::Tools. Currently, the WordNet::Tools constructor (new()) > builds a list of compounds internally for use with compoundify. > This is different from how we did things before -- i.e., first > generate a compounds.txt file using compounds.pl, and then > use the compounds.txt in compoundify. The compoundify in > WordNet::Tools simply generates a list of compounds from WordNet > at startup, and then uses this list as its list of compounds. > A new getCompounds method in WordNet::Tools could simply return this > internal list, if required. Ah, very interesting. I didn't realize this was how things were structured now, but it makes good sense. I think that compounds.pl program is very neat, and having a getCompounds method would actually be potentially very useful for users. I think it's a natural enough question to ask - that is, what are the compounds in WordNet...so having that as a part of a Tools package makes good sense to me. I think what Text::Similarity needs is probably independent of WordNet - that is it really just needs that string matching logic used in compoundify - given a list of compounds find them in a given text - so in that case a getCompounds method would be very handy (if we wanted to find WordNet compounds) or the user could provide their own list from some other source and then match in about the same way. The matching logic is already in Text-Similarity and in fact it might work as it is, I haven't looked at that too deeply as yet... So, anyway, I do think a getCompounds method in WordNet::Tools could be very useful for those modules like Text-Similarity that might like to go looking for WordNet compounds. Probably we wouldn't want to build in a dependence on WordNet-Similarity though, so we'd just run that once and then provide the compounds to Text-Similarity. Having that list in a "Perl form" would be nice, as that would make it easy to send into Text-Similarity... I just wanted to point out that the "hash-code" for the different > versions WordNet isn't really a standard. It was just something we > (rather Ben Haskel) came up with, to generate an identifier for > WordNet, from the WordNet data files. We just run an SHA1 hash function > over the WordNet data file names and their sizes, to get this unique > identifier. But someone could easily come up with a different way to > generate a WordNet version identifier. Also, if it so happens that two > different WordNet versions have data files with the exact same sizes, > then they would get the same identifier. So, this method is not perfect. > But I think it works, in general, since different versions of WordNet > are unlikely to have the exact same file sizes. Thanks for clarifying this - I do think the SHA1 idea *should* provide unique identifiers, and in fact I think it might even be overly unique, in that a Windows 2.0 and a Unix 2.0 should have different values (I assume there must be some formatting differences that cause them to be rather different). But, I actually think that is good, in that it would make it possible to identify the exact WordNet version being used. But I do agree, we'll want to be on the alert for a WordNet that somehow has the same SHA1 values as another version. It doesn't seem likely, unless WordNet were to release a version that differed only in respect to documentation and not the data files, but that doesn't seem to be their style. Anyway, if more and more people start using it, it could become a > standard. But I guess for now maybe it would be better to refer to it > as our internal WordNet version identifier, or something. Agreed - best to make it clear we are the ones producing that hash, and not potentially confuse WordNet users who then expect that elsewhere. Thanks! Ted > > -- Sid. > > On Wed, 2008-04-09 at 21:39 -0500, Ted Pedersen wrote: > > Hi Sid, > > > > Yes, I think WordNet::Tools is terrific...there is in fact a kind of > > interesting issue there - compoundify could even be viewed as WordNet > > independent - it really just needs a list > > of compounds from somewhere....and I think there are possibly some > > issues like that > > with the Freq.pl programs, not really compoundify issues, but > > functionality that is primarily > > text based and doesn't need WordNet, and indeed there is some redundancy > between > > those programs. Eliminating that redundancy has been on my list of > things to do > > for some time, and I think it would really be a nice enhancement to > things... > > > > Anyway, the reason I was thinking about compoundify in a WordNet > > independent sense > > is that Text-Similarity wants to have a compounding operation > > included, but it doesn't > > currently have one (or the one it has doesn't seem to actually > > work...) So..I don't > > know if would make sense at all to think about a WordNet Tool that > > just provided a > > list of compounds and then a separate Text::Compoundify module...That > actually > > almost feels like a QueryData method....getCompounds or > something....hmmm.... > > > > As to other WordNet functionality, I just added some constants for my > > hash values > > to refer to wordnet versions more conveniently - I was kind of wishing > that > > WordNet-QueryData would go ahead and do that conversion so that we could > get > > reliable values from version() again, and in fact that's what confused > > me earlier today. > > I had thought that was done but I don't think it was...so > > anyway....that does seem > > like an operation that users? (maybe developers) might end up doing - > > figuring out > > a table of hash to wordnet version values.... > > > > I wonder too, did we ever figure out if the hash values different on > > Windows? I suppose > > the must....so that's another possible point of failure, but...well, > > one thing at a time. :) > > > > Otherwise, I think we've done a pretty good job of "exposing " the > > functionalty of WordNet > > Similarity so that people can get at some of the interesting functions > > (like finding > > hypernym trees, depths, etc.) , and i don't notice much duplication > > any more except > > as you say in some of the /utils... but, certainly worth thinking > > about especially as > > both SenseRelate and maybe even Text-Similarity start to grow up a bit > > and make us > > of different sorts of functionality.... > > > > Thanks! > > Ted > > > > On Wed, Apr 9, 2008 at 9:17 PM, Siddharth Patwardhan <si...@cs...> > wrote: > > > > WordNet::Tools (a module included in WordNet::Similarity) is > something > > > > we will need to > > > > exploit in WordNet::SenseRelate - it does two things that are > > > > important for us there, > > > > providing reliable version information, and then doing compoundify. > We > > > > do compoundify > > > > in many different modules, but I think it makes sense to centralize > it > > > > it in one place, > > > > and I think that place is WordNet::Tools... > > > > > > Right. That was the motivation behind creating WordNet::Tools... > > > centralizing some common functions. Compoundify was present in many > > > different modules and programs *within* WordNet::Similarity itself. > > > And we updated the code to make it faster (twice I think). And each > > > time we had to change all the different instances of the same > function. > > > So, we centralized it into WordNet::Tools. > > > > > > On that note, if you come across any other WordNet-specific function > > > that can be centralized, you may want to consider putting it into > > > WordNet::Tools. (Hmmm... now that I think about it, there is quite > > > a bit of redundancy in the *Freq.pl programs... I wonder how much > > > of that is WN-specific.) > > > > > > -- Sid. > > > > > > > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse <http://www.d.umn.edu/%7Etpederse> > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > senserelate-developers mailing list > > sen...@li... > > https://lists.sourceforge.net/lists/listinfo/senserelate-developers > > -- Ted Pedersen http://www.d.umn.edu/~tpederse |