SILVERCODERS DocToText is a powerful utility that can convert documents to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications.
It supports MS Office binary formats: MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX), iWork, ODFXML (FODP, FODS, FODT), PDF, EML (emails) and HTML.
DocToText can be also used for searching, indexing archiving, as a fast console viewer or to recover text from corrupted documents. It can also extract text from annotations (comments) and read metadata like author, last modification date or number of pages.
As a note of full disclosure, I sponsored some of the recent development of this application, especially regarding corrupt MS Office 2007 extracting capabilities. That being said, this is a fantastic command line text/data extractor from MS Office files. I do have experience with it extracting text from corrupt docx and pptx files where Word and PowerPoint 2007 themselves fail. Additionally I have experience with it extracting data from corrupt xlsx files. It may not be as effective in recovering text from corrupt doc Word 97-2003 files, Open Office or RTF ones, I only have limited experiences with these. They have yet to include features allowing recovering of data from Excel 97-2003 or text from PowerPoint 97 - 2003 corrupt or not. This software is a very effective command line extractor of text from doc, docx, pptx, odt, ods, odp and rtf non-corrupt files as well as data from xlsx files. Data from xlsx files are returned in tab format text files as opposed to the perhaps more common csv format. This app is very well suited as an easy building block or back end for powerful corrupt Office 2007 extracting GUI or web service and MS Office, Open Office and RTF non-corrupt converters. Again it is possible the program will work corrupt doc, odt, ods, odp and rtf files but I have limited experience with these.