An open source RESTFul Web Service for text , meta-data extraction and analysis.
oss-text-extractor supports various binary formats:
Word processor (doc, docx, odt, rtf)
Spreadsheet (xls, xlsx, ods)
Presentation (ppt, pptx, odp)
Publishing (pdf, pub)
Web (rss, html/xhtml)
Medias (audio, images)
Others (vsd, text)