SILVERCODERS DocToText is a powerful utility that can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications.

It supports MS Office binary formats: MS Word (DOC), MS Excel (XLS), MS PowerPoint (PPT), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX) and HyperText Markup Language (HTML).

DocToText can be also used as a fast console viewer or to recover text from corrupted documents. It can extract text not only from document body but also from annotations (comments) embedded in odt, doc, docx or rtf files and read metadata like author, last modification date or number of pages.

Project Members:

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks