I believe I have found a bug in RC2 with the way Wordnet sense keys are retrieved (Word.getSenseKey()). I was trying to retrieve the sense keys for all the words in the synsets for the word "fair" as an adjective. The following code exposes the bug:
public void test() throws JWNLException
IndexWord indexWord = dictionary.getIndexWord(POS.ADJECTIVE, "fair");
for (Synset synset : indexWord.getSenses())
for (Word word : synset.getWords())
System.out.println(" " + word.getLemma() + " " + word.getSenseKey());
The following exception is thrown.
The problem is that the lines in data.adj file may also contain a "syntactic marker" (one of (a), (p), or (ip)) with the lemma. In this case the lemma is "fair(a)". If the lemma does contain one of the markers then the function Grep.grep(offset, lemma) fails to match the lemma in the line and returns an empty string, which in turn causes StringTokenizer.nextToken() to fail.
I've fixed the problem for myself by modifying the constructors for the Word class to parse the lemma looking for any syntactic markers. If a marker is found it is stripped from the lemma and saved in a _position field.
I fixed this in my fork at http://extjwnl.sourceforge.net/
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.