1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Using Other Knowledge Bases

S-Match uses two interfaces to extract knowledge out of natural language labels:

  • ILinguisticOracle provides access to linguistic knowledge, such as lemmas and senses
  • ISenseMatcher provides access to background knowledge, such as relations between senses

To access a knowledge base, one should provide an implementation of these two interfaces.

The default configuration file s-match.properties provides access to a default linguistic oracle and background knowledge which uses WordNet 2.1. WordNet 2.1 is accessed using extJWNL library, which is configured using file_properties.xml configuration file.

Using Other Wordnets

S-Match uses extJWNL to access WordNet-like databases. extJWNL has several options for accessing the database files, with two of them of our interest being:

  • file-based access: dictionary files are accessed as they are. This method is slower, but requires little memory.
  • map-based access: dictionary files are converted first into HashMaps, serialized, and then accessed. This method is faster, but requires more memory.

S-Match uses WordNet-like dictionaries as linguistic knowledge (during preprocessing, or "offline" processing, via ILinguisticOracle) and as background knowledge (during reasoning, or "online" processing via ISenseMatcher). For the second interface, we provide two implementations:

The second implementation significantly speeds up the "online" processing. These implementations give flexibility in choosing between speed and memory requirements.

The GeoWordNet

GeoWordNet provides a large and rich knowledge base in WordNet format. And it is possible to use it with S-Match.

Using InMemoryWordNetBinaryArray

Configuring S-Match

Here we provide a step-by-step guide and sample configuration files for configuring S-Match to use GeoWordNet.

  1. Edit bin\match-manager.cmd or .sh script and change -Xmx256M -Xms256M to allow more memory: -Xmx6G -Xms6G.
  2. Create the data\wordnet\geowordnet folder where the knowledge base will be stored. Create the following subfolders:
  3. Download full version of GeoWordNet in dict format: geowordnet-dict-full-20110330.zip and unpack it to the data\wordnet\geowordnet\dict folder. Alternatively, you might you a smaller compat version, which contains less data, but also has smaller memory requirements.
  4. In the conf folder create a file_properties-gwn.xml configuration file for the extJWNL. This file provides file-based access to dictionary files.
  5. In the conf folder create a s-match-gwn.properties configuration file for S-Match. This file will point S-Match to the geowordnet knowledge base.
  6. In the conf folder create a s-match-create-wn-caches-gwn.properties configuration file for S-Match. This file will be used to create a cache for the geowordnet knowledge base.
  7. To cache new knowledge base run the following command in the bin folder. This should create several files in the data\wordnet\geowordnet\cache folder:
    match-manager.cmd wntoflat -config=..\conf\s-match-create-wn-caches-gwn.properties
    

Running the matching

Now, to run the matching, execute bin\match-manager.cmd as usual, adding -config=..\conf\s-match-gwn.properties for S-Match to use the new knowledge base. Remember that for matching to work correctly, matching and preprocessing should be done using the same knowledge base. For example, to match the example classifications c.txt and w.txt using geowordnet knowledge base run:

  1. to convert the files into XML format which stores the preprocessing information:
    match-manager.cmd convert ..\test-data\cw\c.txt ..\test-data\cw\c.xml -config=..\conf\s-match-Tab2XML.properties
    match-manager.cmd convert ..\test-data\cw\w.txt ..\test-data\cw\w.xml -config=..\conf\s-match-Tab2XML.properties
    
  2. to preprocess the contexts using geowordnet knowledge base:
    match-manager.cmd offline ..\test-data\cw\c.xml ..\test-data\cw\c-gwn.xml -config=..\conf\s-match-gwn.properties
    match-manager.cmd offline ..\test-data\cw\w.xml ..\test-data\cw\w-gwn.xml -config=..\conf\s-match-gwn.properties
    
  3. to match the contexts using geowordnet knowledge base:
    match-manager.cmd online ..\test-data\cw\c-gwn.xml ..\test-data\cw\w-gwn.xml ..\test-data\cw\result-cw-gwn.txt -config=..\conf\s-match-gwn.properties
    

Using WordNet

This configuration does not require as much memory for conversion, as previous one, but it is slower during matching.

  1. Follow steps 2-6 from Configuring S-Match section above.
  2. Ensure the file MatchManager.java contains the following lines. Notice the order, multiwords cache is created first:
        private void convertWordNetToFlat(Properties properties) throws SMatchException {
            DefaultContextPreprocessor.createWordNetCaches(CONTEXT_PREPROCESSOR_KEY, properties);
            InMemoryWordNetBinaryArray.createWordNetCaches(GLOBAL_PREFIX + SENSE_MATCHER_KEY, properties);
        }
    
  3. Run ant jar in the main folder of the distribution to compile the sources and update the s-match.jar. See HowToBuild for details.
  4. Run the partial conversion to create multiword cache:
    match-manager.cmd wntoflat -config=..\conf\s-match-create-wn-caches-gwn.properties
    
  5. Stop the conversion after the multiword cache (usually stored in data/geowordnet/cache/multiwords.hash) is created:
    Creating WordNet caches...
    Creating multiword hash...
    Multiwords: xxx
    Done
    
  6. In the conf folder create a s-match-gwn-wn.properties configuration file for S-Match. This file will point S-Match to the geowordnet knowledge base.
  7. When running the matching (see steps b and c above), use s-match-gwn-wn.properties configuration file:
    1. the same as above
    2. to preprocess the contexts using geowordnet knowledge base:
      match-manager.cmd offline ..\test-data\cw\c.xml ..\test-data\cw\c-gwn.xml -config=..\conf\s-match-gwn-wn.properties
      match-manager.cmd offline ..\test-data\cw\w.xml ..\test-data\cw\w-gwn.xml -config=..\conf\s-match-gwn-wn.properties
      
    3. to match the contexts using geowordnet knowledge base:
      match-manager.cmd online ..\test-data\cw\c-gwn.xml ..\test-data\cw\w-gwn.xml ..\test-data\cw\result-cw-gwn.txt -config=..\conf\s-match-gwn-wn.properties
      

The Stanford Wordnet Project

The Stanford Wordnet Project provides several automatically created knowledge bases in WordNet format, including sense-clustered and augmented wordnets. It is possible to use these wordnets with S-Match in a similar way to GeoWordNet.

The MultiWordNet

MultiWordNet provides WordNet-like semantic knowledge bases in several languages. It is possible to use it with S-Match via extJWNL, there is an import procedure. If you already have a MultiWordNet license and database files, please, contact us.

Attachments