New to NLP, Question about annotation

  • Anonymous - 2010-11-29

    I am new to NLP and I am looking for a starting point, in terms of some tutorials, documentation or example code.
    I have been told to research the possibilities of processing natural text to extract some structured data from it.
    For example I want to extract(annotate) height and weight from following statements.
    "He is 6 feet tall and weighs 200 pounds" or
    "His height is 6 feet and weight is 200" etc.
    I have looked into UIMA but it seems like a  self created REGEX dictionary with no training capabilities.
    So in a nutshell, what Java framework can I use to create an annotation engine that can be trained as well!
    Any help(pointers) on this will heavily appreciated.

  • Joern Kottmann

    Joern Kottmann - 2010-11-30


    UIMA is just a framework to build such an application, like your are investigating. It does not
    offer you any analysis capabilities, but instead helps you to put together an analysis application
    made out of pre-built nlp engines.

    OpenNLP can help you to perform tokenization and sentence detection. In order
    to extract your structured data, you need to find the "pieces" you need to extract,
    there the name finder could help you. It can detect the height and weight.
    Afterwards you would need an additional steps to extract the detected numbers
    and convert them  to a machine understandable format, e.g. a double for the height
    in cm.

    OpenNLP also offers an UIMA integration, which can be used to solve the
    tokenization, sentence detection and Named Entity Extraction tasks you might need
    to solve.

    Hope that helps,


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks