Does the MapBackedDictionary work?

Help
Anonymous
2012-10-15
2013-05-23

  • Anonymous
    2012-10-15

    I'm getting

    net.sf.extjwnl.JWNLException: Unable to install net.sf.extjwnl.dictionary.MapBackedDictionary
        at net.sf.extjwnl.dictionary.Dictionary.getInstance(Dictionary.java:232)
        at net.sf.extjwnl.dictionary.Dictionary.getMapBackedInstance(Dictionary.java:266)
        at example.ExtJWNLTest.main(ExtJWNLTest.java:28)
    Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at net.sf.extjwnl.dictionary.Dictionary.getInstance(Dictionary.java:222)
        ... 2 more
    Caused by: net.sf.extjwnl.JWNLException: Could not open dictionary files
        at net.sf.extjwnl.dictionary.MapBackedDictionary.load(MapBackedDictionary.java:231)
        at net.sf.extjwnl.dictionary.MapBackedDictionary.<init>(MapBackedDictionary.java:40)
        ... 7 more
    Caused by: java.io.StreamCorruptedException: invalid stream header: 20203120
        at java.io.ObjectInputStream.readStreamHeader(Unknown Source)
        at java.io.ObjectInputStream.<init>(Unknown Source)
        at net.sf.extjwnl.princeton.file.PrincetonObjectDictionaryFile.openInputStream(PrincetonObjectDictionaryFile.java:118)
        at net.sf.extjwnl.princeton.file.PrincetonObjectDictionaryFile.openFile(PrincetonObjectDictionaryFile.java:175)
        at net.sf.extjwnl.dictionary.file.AbstractDictionaryFile.open(AbstractDictionaryFile.java:83)
        at net.sf.extjwnl.dictionary.file.DictionaryCatalog.open(DictionaryCatalog.java:83)
        at net.sf.extjwnl.dictionary.file.DictionaryCatalogSet.open(DictionaryCatalogSet.java:47)
        at net.sf.extjwnl.dictionary.MapBackedDictionary.load(MapBackedDictionary.java:229)
        ... 8 more
    

    …whether I do

    Dictionary.getMapBackedInstance
    

    or

    Dictionary.getInstance("...map_properties.xml")
    

    , and regardless of whether I point to Wordnet 2.1 or 3.0. I'm passing exactly the same WordNet path as to getFileBackedInstance and/or file_properties.xml respectively and it works fine in the file version in all cases. (My computer is Windows Vista). Do you have a solution?

    2nd question, if I get the map backed dictionary working, and add new entries to it, does it back up those new entries back to the WordNet directory in case of computer crash or restart?

     
  • First question.

    MapBackedDictionary requires that you first convert the dict folder into its own format, which is java serialized hashmap.

    To convert the standard "dict" folder into map format, use dict2map (or dict2map.bat) from the bin folder of the distribution. First, configure properties files to point to the dict folder, then invoce the tool:

    dict2map <properties file> <destination directory>

    The tool will output the map files into the <destination directory>. Then you can use  this directory with MapBackedDictionary - use map_properties.xml and change the <param name="dictionary_path" value="./data/map"/> parameter.

     
  • Second question.

    MapBackedDictionary (or Database one) is not connected to the File-based one. That is, if you change map-based dictionary, it will not be reflected in any other dictionary. However, you can save map-based dictionary itself and next time it will load the changes you saved before. The same is with file-based one: change, save, load the changes next time.

     

  • Anonymous
    2012-10-18

    Thanks A.! The file-based dictionary uses a memory cache, yes? Where do I change the size of this cache? What is the performance compared to the MapBackedDictionary?

    Could you give me a code example of saving the dictionary after adding an entry, both for map & file based versions?

     
  • File-based one uses memory caches. One is in the morphological dictionary:

            <param name="morphological_processor" value="net.sf.extjwnl.dictionary.morph.DefaultMorphologicalProcessor">
                <param name="cache_capacity" value="150"/>
                <param name="operations">
    

    Another (main) place is in the dictionary itself. Switch it on or off

        <dictionary class="net.sf.extjwnl.dictionary.FileBackedDictionary">
            <param name="enable_caching" value="false"/>
            <param name="morphological_processor" value="net.sf.extjwnl.dictionary.morph.DefaultMorphologicalProcessor">
    

    Set the size of all dictionary caches at once:

        <dictionary class="net.sf.extjwnl.dictionary.FileBackedDictionary">
            <param name="cache_size" value="150"/>
            <param name="morphological_processor" value="net.sf.extjwnl.dictionary.morph.DefaultMorphologicalProcessor">
    

    Or set them one by one like this, possible names are: index_word_cache_size, synset_word_cache_size, exception_word_cache_size

        <dictionary class="net.sf.extjwnl.dictionary.FileBackedDictionary">
            <param name="index_word_cache_size" value="150"/>
            <param name="morphological_processor" value="net.sf.extjwnl.dictionary.morph.DefaultMorphologicalProcessor">
    
     
  • It is difficult to compare performance given that you've said nothing about your particular circumstances and they matter. I guess you should do the tests yourself. In general: map-based caches everything at the start and takes some time to load. File-based caches record-by-record and is faster to start. So, if you predict using the whole dictionary pretty aggressively, you might be better off using map. If you don't know, or you just the part of it, you might be better of using file-based may be with a tweaked cache size.

    You need to test it yourself.

     
  • Dictionaries are saved like this:

    d = Dictionary.getInstance...;
    // editing code skipped
    d.save();