Bug in Word.getSenseKey()

  • Keith Suderman

    Keith Suderman - 2008-10-16

    I believe I have found a bug in RC2 with the way Wordnet sense keys are retrieved (Word.getSenseKey()). I was trying to retrieve the sense keys for all the words in the synsets for the word "fair" as an adjective.  The following code exposes the bug:

       public void test() throws JWNLException
          IndexWord indexWord = dictionary.getIndexWord(POS.ADJECTIVE, "fair");
          for (Synset synset : indexWord.getSenses())
             for (Word word : synset.getWords())
                System.out.println("   " + word.getLemma() + " " + word.getSenseKey());

    The following exception is thrown. 

        at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
        at net.didion.jwnl.dictionary.FileBackedDictionary.getIndexLineWord(FileBackedDictionary.java:440)
        at net.didion.jwnl.dictionary.FileBackedDictionary.getSenseKey(FileBackedDictionary.java:423)
        at net.didion.jwnl.data.Word.getSenseKey(Word.java:167)
        at ANC.experiment.JWNLTest.test(JWNLTest.java:100)
        at ANC.experiment.JWNLTest.run(JWNLTest.java:51)
        at ANC.experiment.JWNLTest.main(JWNLTest.java:216)

    The problem is that the lines in data.adj file may also contain a "syntactic marker" (one of (a), (p), or (ip)) with the lemma. In this case the lemma is "fair(a)". If the lemma does contain one of the markers then the function Grep.grep(offset, lemma) fails to match the lemma in the line and returns an empty string, which in turn causes StringTokenizer.nextToken() to fail.

    I've fixed the problem for myself by modifying the constructors for the Word class to parse the lemma looking for any syntactic markers. If a marker is found it is stripped from the lemma and saved in a _position field.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks