#5 Apache Tika

Next Release
Converters (5)

Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. It provides command-line access to content parsers for various document formats developed for the Apache Lucene Project, notably OOXML, ODF, RTF, PDF, ePub, HTML and XML. It is also capable of parsing metadata from several audio, and image formats, as well as flv. It is available from http://tika.apache.org.


  • John Dickinson

    John Dickinson - 2011-01-13
    • status: open --> pending
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

  • SourceForge Robot

    • status: pending --> closed
  • John Dickinson

    John Dickinson - 2011-04-05


  • John Dickinson

    John Dickinson - 2011-04-05
    • assigned_to: nobody --> einarin
    • status: closed --> open
  • John Dickinson

    John Dickinson - 2011-04-05
    • status: open --> closed

Log in to post a comment.