Hi Sid,

Isn't the main problem that the hash codes are solving that WordNet 3.0 in some cases has a copyright notice that says WordNet 2.1? I think that's how this started, and how the version number became unreliable.  Other than that, I *think* Jason's version() function was reliable, but of course that's a very major "other than that"...

So, I guess my sense of it has been that the hash code let's us tell a 3.0 from a 2.1 WordNet. Now, once making that change to using hash codes then of course we can use the hash codes to tell apart the other versions, but I think in the end there is a need to know if you are using 1.71 or 2.0 or 3.0, in order to choose which version of QueryData or WordNet-Similarity to use, if nothing else.

Actually, I guess the way I think of it is that the "real" version of WordNet still remains 1.7 or 2.1, and that the hash code is essentially a bug fix for the "3.0 as 2.1" labeling problem. Since the hash code differentiates more finely than does the version number, we get into this "distribution" issue - that is a 2.1 Windows versus a 2.1 Linux will have a different hash code, but at some level those are still the same version of WordNet, at least in terms of content. So I guess I would think of the 2.0 style number as being the "content" label, while the hash code is essentially the distribution indicator, and that's actually a good thing, as you could then recognize if a 2.1 WordNet had come from some previously unknown source.

So my thinking on the version() function was that it becomes a master list of hash codes  we know about, and I think at some point those hash codes need to be associated with an "official" WordNet version number, because at least for our purposes that dictates which version of WordNet-Similarity to use, and that it is probably better to maintain that centrally in something like WordNet-QueryData, rather than having each application creating their own mapping table (this hash code is associated with version 2.1, etc.)

So, I guess I do think that the WordNet version numbers are still quite meaningful (2.0 versus 3.0, etc) and that we want to be able to get those in a fairly reliable and simple way.

It seems like we might have a Linux kernel versus Linux distribution issue here. I'd say that the WordNet content is like the kernel, and is always the same for a particular version regardless of the packaging around it, which is the distribution. So in a sense the WordNet version number is our kernel id (2.1) and the hash code is the distribution identifier. The only problem is that the kernel authors forgot to update their version number in some cases in 3.0, and so we lost the ability to find the WordNet version number directly, and so we end up in this kind of peculiar situation.

Anyway, that's my thinking behind trying to save the "WordNet version number" - I think the other side of the coin is that there is some segment of users that simply won't care what the hash code or distribution is, they'll want to know if it's WordNet 2.1 or 3.0, and this would be an easy way for them to find out (if version returns it). Otherwise, it becomes the user's problem to look at the distribution tar file or install directory, or copyright notice, which is actually what I ended up doing when creating my little mapping table that sort of starting this line of thought.

I truly regret that they didn't update their copyright notice in 3.0 :) I think that probably would have avoided all this. But, now that we are here being able to make this kernel versus distribution kind of distinction seems rather nice.

So to be specific, I guess I'm trying to lay out a case here for version returning both a WordNet version number as well as a Hashcode, and perhaps even a distribution id string. I get the impression you are suggesting that version should just return the hashcode? Or am I misunderstanding that?


On Sun, Apr 13, 2008 at 2:38 AM, Siddharth Patwardhan <sidd@cs.utah.edu> wrote:
Hi Ted,

So, all of this still seems to assume that there is a single entity
distributing WordNet, and there are clear "version numbers" on
WordNet. For example, is the debian/ubuntu distribution the same
or a different version? If someone uses the wordnet tools and creates
(and distributes) a domain-specific subset (say for the biomedical
domain), will that get a clear version number in this sequence.
The hash-code gets around this somewhat... and a subset of these
hash-codes can be definitively tied to their respective officialt
versions of WordNet. But, I don't know if that is something we
can rely on at this point. In other words, I'm not too sure how
wise it would be to have any code relying on specific version
numbers anymore -- now that even the WordNet folk have announced
that those are not reliable numbers.

So, I guess we need a unique identifier for each variation/release
of WordNet, and perhaps an official version number where possible.
All of this, ideally, from WordNet::QueryData. I just don't think
we should have code that relies on specific "official version
numbers" of WordNet.

I think the reason why Jason removed the version method, was that it
wasn't returning reliable information anymore. I don't know how easy
it would be convince him to put some of this functionality back.

-- Sid.

On Fri, 2008-04-11 at 23:53 -0500, Ted Pedersen wrote:
> Hi Sid,
> Yes, I think WordNet-QueryData is the best place for this, especially
> since it
> is where we used to get our version() from anyway. I do think that we
> might want
> to have version return both the hash value and a "human readable"
> value, simply
> because it's sometimes nice to know which version of WordNet you are
> using
> in terms of content (2.1) versus the particular distribution (Windows,
> etc.) So
> maybe version() could return an array
>    ($wnver, $distribution, $hashcode) = $qd -> version();
> As this would let us (and others) look at the version from various
> points of view.
> wnver would be the traditional number (2.1, 3.0) distribution could be
> a string
> that says "unix, windows, prolog, whatever" and hashcode is of course
> the hashcode.
> That seems like it would cover all bases....I suppose we could even
> include stuff
> like release date (of WordNet) but I don't know if we'd need to go
> quite that far...
> But, I do think this feels like QueryData functionality to me...I
> think if we worked out
> a prototype (what it returns, etc.) and then presented that to Jason,
> then we could
> see what he thinks, and also decide how to implement (if he thinks it
> belongs in
> QueryData).
> Thanks!
> Ted
> On Fri, Apr 11, 2008 at 4:59 AM, Siddharth Patwardhan
> <sidd@cs.utah.edu> wrote:
>         Right. I also think that some identifier for each unique
>         distribution
>         is required. In fact that was the main reason for creating the
>         hashCode
>         method and WordNet::Tools. However, I think going back and
>         creating
>         some method to get the WordNet version number would be rather
>         restrictive. What I mean by this is... as I understand it the
>         WordNet
>         folk are envisioning many many different distributions of
>         WordNet
>         created by people using their WordNet-building tools. In such
>         a
>         situation, the "version number" doesn't have much meaning...
>         but a
>         unique identifier associated with a WordNet distribution could
>         serve
>         the same purpose.
>         If I had to rank the various options... I'd say somehow
>         convincing Jason
>         Rennie to include a WordNet identifier function in QueryData
>         would be
>         the cleanest solution -- since WordNet::QueryData is our
>         window or door
>         to the WordNet world.
>         The second option is the current solution of using
>         WordNet::Tools...
>         which, to me, is a layer over QueryData for complex WordNet
>         functions,
>         or for functions we would like to see in QueryData, but can't.
>         Ideally,
>         this package should be distributed by itself, rather than
>         inside
>         WordNet::Similarity. I'll have to find out if there is a
>         clean/standard
>         way to separate a module from a package and put it into its
>         own package.
>         The WordNet::Version module seems like a bit of overkill to
>         me. Having
>         a completely separate package just to get the version of
>         WordNet seems
>         a little weird.
>         So, should we talk to Jason Rennie first?
>         Thanks.
>         -- Sid.
>         On Thu, 2008-04-10 at 12:31 -0500, Ted Pedersen wrote:
>         > Hi Sid,
>         >
>         > I think something like this would be a good idea....and then
>         we could
>         > gradually add other hash values as we install/test with
>         other versions
>         > of WordNet - or maybe we could even have a little program
>         for users
>         > to run, and we could ask them to run this for us and report
>         back their
>         > hash value for whatever version of WordNet they are using,
>         especially
>         > if they are on Windows or using some other version (like
>         maybe the
>         > Debian package of WordNet....)
>         >
>         > I do think WordNet version mis-identification is a bigger
>         problem than
>         > people often appreciate - it's very easy to get all that
>         confused
>         > given the slightly obscure way that
>         > WordNet needs to be installed...it's happened to me a few
>         times in
>         > fact...
>         >
>         > So...I'm turning this into a major project all of a
>         sudden. :) I do
>         > think some sort of version identification method should
>         exist
>         > somewhere though....these seem like the most likely
>         candidates...
>         >
>         > WordNet::Tools (existing)
>         > WordNet::Version (new?)
>         > WordNet::QueryData (existing, home of deprecated version)
>         >
>         > Thanks,
>         > Ted
>         >
>         > On Wed, Apr 9, 2008 at 11:22 PM, Siddharth Patwardhan
>         > <sidd@cs.utah.edu> wrote:
>         >         Maybe these could be added to WordNet::Tools?
>         >
>         >         > use constant WNver20 => 'US9EUGPpJj2jVr
>         +fRrZqQX6vcGs';
>         >         > use constant WNver21 => 'LL1BZMsWkr0YOuiewfbiL656
>         +Q4';
>         >         > use constant WNver30 =>
>         'eOS9lXC6GvMWznF1wkZofDdtbBU';
>         >
>         >
>         >         (And we could generate hash-codes for 1.5, 1.6, 1.7,
>         1.7.1, if
>         >         those
>         >         still exist, and add those too.)
>         >
>         >         -- Sid.
>         >
>         >
>         >
>         >
>         > --
>         > Ted Pedersen
>         > http://www.d.umn.edu/~tpederse
>         >
>         -------------------------------------------------------------------------
>         > This SF.net email is sponsored by the 2008 JavaOne(SM)
>         Conference
>         > Don't miss this year's exciting event. There's still time to
>         save $100.
>         > Use priority code J8TL2D2.
>         >
>         http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
>         > _______________________________________________
>         senserelate-developers mailing list
>         senserelate-developers@lists.sourceforge.net
>         https://lists.sourceforge.net/lists/listinfo/senserelate-developers
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse

Ted Pedersen