I tried JWNL on GermaNet (a German equivalent to WordNet, very similar but not quite identical in db structure); solved some problems but some remaining. Anyone (know of anyone) who succeeded and willing to share info?
Can you be more specific on the problems you had? One of the goals of JWNL is to support any WordNet-like dictionary, so your experiences would be very helpful.
OK - so far: First thing I did was renaming dictionary files to their obvious English equivalents, just to see what would happen: "index.nomen" to "noun.idx" etc. Noticed that there is no equivalent for some files, eg there was no "frames.vrb" equivalent and indeed had to comment out VerbFrame.initialize() in JWNL.java.
In the german "noun.idx" records, the number of senses immediately precedes the sense offset numbers, whereas in WordNet there is an extra number in between:
"dog n 6 5 @ ~ #m #p %p 6 1 01752990 08300330 08227032 08119778 03398163 02358004"
notice the '1' between the '6' and the six sense offsets, whereas in German the '2' immed preceeds the two offsets
"hund n 0 2 @ ~ 2 02439173 01623178".
Consequently in class AbstractPrincetonDictionaryElementFactory in createIndexWord() I commented out the appropriate tokenizer.nextInt() that skips the extra digit in the WordNet version. So far OK. Sort-of works.
However I get stumped in the PointerType class; my TYPES array contains the initialised PointerType values (such as ANTONYM; ANTONYM_KEY; and NOUN, VERB, ADJECTIVE, ADVERB, LEXICALKEY_TO_POINTER_TYPE_MAP) and I have no idea what resource underlies the mapping to the percent-signs, angle brackets etc...
I am not really a great Java expert so be kind to me...
A do-it-right-from-the-start approach would be better but where is the start?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.