With HelloNzb you can download (binary) files from Usenet servers via NZB index files. The software is based on Java and can thus run on many platforms (tested on Windows and Linux). Automatic archive verification via PAR2, automatic RAR archive extraction, built-in yEnc- and UU-decoding. Portable, no installation required.
A RESTFul/JSON Web Service for text and metata extraction
An open source RESTFul Web Service for text , meta-data extraction and analysis.
oss-text-extractor supports various binary formats:
Word processor (doc, docx, odt, rtf)
Spreadsheet (xls, xlsx, ods)
Presentation (ppt, pptx, odp)
Publishing (pdf, pub)
Web (rss, html/xhtml)
Medias (audio, images)
Others (vsd, text)
Scan, the Semantic Content ANnotator, is a semantic pipeline that helps connecting information extraction tools to semantic database. UIMA-based, it allows easy plugin-writing: information extraction, ontology control, store in RDF Repositories.