File extractor apache TIKA

Anonymous
2010-05-27
2013-04-15

  • Anonymous
    2010-05-27

    Hi

    It would be great to implement apache tika in Rivulet ES.
    Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
    It supports:
    * HyperText Markup Language
    * XML and derived formats
    * Microsoft Office document formats
    * OpenDocument Format
    * Portable Document Format
    * Electronic Publication Format
    * Rich Text Format
    * Compression and packaging formats
    * Text formats
    * Audio formats
    * Image formats
    * Video formats
    * Java class files and archives
    * The mbox format

    Thanks and Regards Dimce Iliev