SILVERCODERS DocToText is a powerful utility that can convert documents to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications.

It supports MS Office binary formats: MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX), iWork, ODFXML (FODP, FODS, FODT), PDF, EML (emails) and HTML.

DocToText can be also used for searching, indexing archiving, as a fast console viewer or to recover text from corrupted documents. It can also extract text from annotations (comments) and read metadata like author, last modification date or number of pages.

Project Activity

See All Activity >



Other Useful Business Software

Join ACM and start mastering DevOps today. Join ACM and start mastering DevOps today. Icon
Join ACM and start mastering DevOps today. Icon

Access award-winning books, courses, and videos from Safari, Skillsoft , O’Reilly & more. Join ACM today for $75. It’s time to smash your silos.

Learn the DevOps tools and practices you need to work smarter than ever before. Access Safari’s entire collection of nearly 50,000 titles from top publishers such as O’Reilly, Addison-Wesley, Packt, Pearson IT Certification, and others, as well as Skillsoft Learning Collections, which includes over 1,750 online courses, 4,800 eBooks, and thousands of short videos, including comprehensive titles on DevOps platforms and frameworks.

Rate This Project

Login To Rate This Project

User Ratings

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5

User Reviews

  • As a note of full disclosure, I sponsored some of the recent development of this application, especially regarding corrupt MS Office 2007 extracting capabilities. That being said, this is a fantastic command line text/data extractor from MS Office files. I do have experience with it extracting text from corrupt docx and pptx files where Word and PowerPoint 2007 themselves fail. Additionally I have experience with it extracting data from corrupt xlsx files. It may not be as effective in recovering text from corrupt doc Word 97-2003 files, Open Office or RTF ones, I only have limited experiences with these. They have yet to include features allowing recovering of data from Excel 97-2003 or text from PowerPoint 97 - 2003 corrupt or not. This software is a very effective command line extractor of text from doc, docx, pptx, odt, ods, odp and rtf non-corrupt files as well as data from xlsx files. Data from xlsx files are returned in tab format text files as opposed to the perhaps more common csv format. This app is very well suited as an easy building block or back end for powerful corrupt Office 2007 extracting GUI or web service and MS Office, Open Office and RTF non-corrupt converters. Again it is possible the program will work corrupt doc, odt, ods, odp and rtf files but I have limited experience with these.

Read more reviews >

Additional Project Details

Intended Audience

Advanced End Users, Developers, End Users/Desktop

User Interface


Programming Language

C, C++