A RESTFul/JSON Web Service for text and metata extraction
An open source RESTFul Web Service for text , meta-data extraction and analysis.
oss-text-extractor supports various binary formats:
Word processor (doc, docx, odt, rtf)
Spreadsheet (xls, xlsx, ods)
Presentation (ppt, pptx, odp)
Publishing (pdf, pub)
Web (rss, html/xhtml)
Medias (audio, images)
Others (vsd, text)
IdeoReport is a java-based set of packages that allows reports generations in a variety of output formats including xls, pdf, jpeg, xml, csv and html.
It can be integrated to existing applications (java and non-java) via different connectors.
JWDE extracts product information set from web and dumps is into db so e-commerce packages can use it. Currently JWDE can extract information from saved HTML files and convert it into JWDE XML format which can be save to osCMax e-commerce product.