OpenNLP Tools 1.5.0 released

Developers
2010-09-23
2016-07-22
  • Joern Kottmann

    Joern Kottmann - 2010-09-23

    We are proud to announce the release of the OpenNLP Tools 1.5.0.
    It has been a long time since our last release, and we got a lot of
    new features which makes using OpenNLP easier.

    Here are the highlights:
    Model packages now group all resources needed by a component in a zip package
    together with meta data. The components can be instantiate from this single zip package resource
    instead of loading multiple resources depending on the training setup.
    Command line interface has been rewritten and extended.
    Built in evaluation for most components.
    New training API for most components.
    Training support for conll02 and conll06 data.
    POS Tagger can now use a perceptron (sequence) model.
    License was changed from LGPL to ASL.
    Ant build system was replaced by maven.

    Jörn

     
    • Weihong Zhang

      Weihong Zhang - 2013-08-22

      I am using a few R packages (openNLP 0.2-1, webmining, sentiment) to extract some sentences about a stock JPM, but running in the following error:

      Error in eval(expr, envir, enclos) : could not find function "sentDetect"

      Here is the codes I used and I made sure that all packages are installed. I checked the "corpus" variable and it is "a corpus with 20 text documents". I also used "library(help=openNLP)" to list all the functions in the package "openNLP" but did not find "sentDetect" in the list.

      library(XML)
      library(tm)
      library(tm.plugin.webmining)
      library(tm.plugin.sentiment)
      library(NLP)
      library(openNLP)
      stock <-"JPM"
      corpus <- WebCorpus(GoogleFinanceSource(stock))
      sentences <- sentDetect(corpus)

      Here is the running environment. Is it possibly related to the R 3.0.1 version (too new for openNLP) or 64-bit Windows system?

      R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013
      The R Foundation for Statistical Computing
      Platform: x86_64-w64-mingw32/x64 (64-bit)

      Thank you very much.

      Weihong

       
  • Martin

    Martin - 2010-11-18

    Hi, I have two questions for help with:
    1) Can I download the 1.5 version with the source code? What I downloaded from this link only contains a .jar file.
        http://opennlp.sourceforge.net/

    2) In order to run POS tagger, do I have to run SentenceDetector & Tokenizer first? The OpenNLP instruction gives the following example:
    "
    POS Tagging:

    bin/opennlp SentenceDetector models/en-sent.bin < text |
    bin/opennlp TokenizerME models/en-token.bin |
    bin/opennlp POSTagger models/en-pos-maxent.bin

    "

    Can I directly run bin/opennlp POSTagger models/en-pos-maxent.bin without running the first two?

    Thanks.

     
  • Joern Kottmann

    Joern Kottmann - 2010-11-18

    The sourceforge download page also offers a source release, please click on "View all files", next to the download
    button.

    The POS Tagger analysis a tokenized sentence at a time. Using the sentence detector and tokenizer like in the example above is one way to produce such input and is intended as a demonstration only.

    Depending on your use case you might already have text which is segmented into sentences and tokens. The API of the POS Tagger could be used to directly pass these sentences to the POS Tagger without using a command line or file system based interface.

    Hope that helps,
    Jörn

     
  • Martin

    Martin - 2010-11-19

    Thanks joernkottmann.

    Another question, the 'find' method in class ' Span find(String tokens) ' takes an array of String as arguments, why doesn't it directly take a String which is typically a sentence as Argument; Otherwise, a sentence string has to be first converted an String array first, which seems very inconvenient?
     

     
  • Joern Kottmann

    Joern Kottmann - 2010-11-19

    Hi,

    the name finders find method expects a string array. This string array "models" a tokenized sentence, each string is one token and all tokens in the array form a sentence.
    To just feed it with a string the token spans must still be passed. Such a method could be added but is not their
    right now.

    In the end, to make everything more efficient we should probably think about moving to CharSequence and
    away from String.

    Hope that helps,
    Jörn

     
  • Martin

    Martin - 2010-11-23

    Joernkottmann:

    What's the problem of this code:
                               NameFinderME nameModel = new NameFinderME(new TokenNameFinderModel(modelIn));
      String sentence = new String("Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a director of this British industrial conglomerate .");
      String tokens = sentence.split(" ");
      Span neTokens = nameModel.find(tokens);
     
      String myTokens = new String();
      String tons = Span.spansToStrings(neTokens, myTokens); /* ERROR occurs here ! */

    Exception in thread "main" java.lang.IllegalArgumentException: The span 0..2 is outside the given text!
    at opennlp.tools.util.Span.getCoveredText(Span.java:178)
    at opennlp.tools.util.Span.spansToStrings(Span.java:262)
    at Test.main(Test.java:46)

    1) What's the way to print out the identified Named Entities?

     
  • James Kosin

    James Kosin - 2010-11-23

    marlomin,

    Just remove the 'String myTokens = new String();' and change the line following to 'String tons = Span.spansToStrings(neTokens, tokens);'  … should fix the problem.
    You are passing an empty string for spansToStrings() to parse out of.

    James

     
  • Martin

    Martin - 2010-11-23

    This seems confusing?
    Span neTokens = nameModel.find(tokens);

    The passed param "tokens" contains the sentence, and the neTokens returned by "find" contains "token spans for any identified names."

    Here,
    String tons = Span.spansToStrings(neTokens, tokens);

    1) Why do we need to pass the sentence arrary "tokens" again to get the NE token?
        Why can't we directly get the NE tokens from the NE token span array "NETokens"?

    2) Another question with to do with the ChunkerME class.
    The method "chunk(String toks, String tags) " only returns "chunk tags for the given sequence returning the result in an array.". It only returns a tag array. How to get the chunked text array?

    Thanks.

     
  • James Kosin

    James Kosin - 2010-11-24

    Here,
    String tons = Span.spansToStrings(neTokens, tokens);

    1) Why do we need to pass the sentence arrary "tokens" again to get the NE token?
        Why can't we directly get the NE tokens from the NE token span array "NETokens"?

    Spans is a class that contains the start and end indexes into the array you originally passed in so it is light-weight.
    It doesn't keep a copy of the original arrays for that purpose.  You can get the strings like that or you can actually read the spans and get the indexes into the original tokens to get the resulting strings.  It is just a matter of preference.

    2) Another question with to do with the ChunkerME class.
    The method "chunk(String toks, String tags) " only returns "chunk tags for the given sequence returning the result in an array.". It only returns a tag array. How to get the chunked text array?

    Start looking at the cmdline interface classes they will show you more.  Many of the main() functions in the classes are not well kept anymore and are being deprecated.
    Basically, it would be the same operation.  The tags you have should match up in number with the tokens you passed in and you would be able to print the values for the token and tag iterating through the list.

    James

     
  • Joern Kottmann

    Joern Kottmann - 2010-11-24

    Why can't we directly get the NE tokens from the NE token span array "NETokens"?

    There are people who need the spans, so just returning the name tokens wouldn't work for them,
    but returning the spans works actually for both cases.
    In the older versions the name finder just returned a string array containing tags which indicate the spans,
    now its really easy to retrieve the tokens of the names.
    The chunker is still doing it, and will be updated to the new style in on of the next versions.

    Jörn

     
  • Purvi Desai

    Purvi Desai - 2012-05-05

    I am getting the java.lang.IllegalArgumentException: The span [4538..4545) is outside the given text! when I use the tokenize(text) method. Any idea why that might be happening? I am passing it a String called text and I get this error only in some cases. Any help would be appreciated!

     
    • Eyyüp Aydın

      Eyyüp Aydın - 2016-07-22

      Hi.
      I got the same error and I have found the glitch.
      I was using the opennlp in a multi-thread application. Turns out these tokenizing stuff doesn't work in multi-thread "properly". So I loaded every file ("en-sent.bin", "en-token.bin" and "en-pos-maxent.bin") in the thread class. This way each thread has its own model.

      Hope this helps.

       

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks