Extracts plain text from documents in all popular formats.

5.0 Stars (1)
3 Downloads (This Week)
Last Update:
Download doctotext-4.0-20140202.tar.bz2
Browse All Files
Windows BSD Mac Linux


SILVERCODERS DocToText is a powerful utility that can convert documents to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications.

It supports MS Office binary formats: MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX), iWork, ODFXML (FODP, FODS, FODT), PDF, EML (emails) and HTML.

DocToText can be also used for searching, indexing archiving, as a fast console viewer or to recover text from corrupted documents. It can also extract text from annotations (comments) and read metadata like author, last modification date or number of pages.



Other Useful Business Software

Communicate & Connect with Ring Central's VoIP Solution Icon

Cloud Powered Business Phone System

Communicate & Connect with Ring Central's VoIP Solution Icon
1 of 5 2 of 5 3 of 5 4 of 5 5 of 5
129 Reviews
  • Unrivaled value & reliability in one solution
  • Unlimited Calls/SMS/Conferencing/Fax
  • Trusted by 350,000+ Businesses

User Ratings

ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

  • 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    As a note of full disclosure, I sponsored some of the recent development of this application, especially regarding corrupt MS Office 2007 extracting capabilities. That being said, this is a fantastic command line text/data extractor from MS Office files. I do have experience with it extracting text from corrupt docx and pptx files where Word and PowerPoint 2007 themselves fail. Additionally I have experience with it extracting data from corrupt xlsx files. It may not be as effective in recovering text from corrupt doc Word 97-2003 files, Open Office or RTF ones, I only have limited experiences with these. They have yet to include features allowing recovering of data from Excel 97-2003 or text from PowerPoint 97 - 2003 corrupt or not. This software is a very effective command line extractor of text from doc, docx, pptx, odt, ods, odp and rtf non-corrupt files as well as data from xlsx files. Data from xlsx files are returned in tab format text files as opposed to the perhaps more common csv format. This app is very well suited as an easy building block or back end for powerful corrupt Office 2007 extracting GUI or web service and MS Office, Open Office and RTF non-corrupt converters. Again it is possible the program will work corrupt doc, odt, ods, odp and rtf files but I have limited experience with these.

    Posted 09/27/2009
Read more reviews

Additional Project Details

Intended Audience

Advanced End Users, Developers, End Users/Desktop

User Interface


Programming Language

C, C++



Thanks for helping keep SourceForge clean.

Screenshot instructions:
Red Hat Linux   Ubuntu

Click URL instructions:
Right-click on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Briefly describe the problem (required):

Upload screenshot of ad (required):
Select a file, or drag & drop file here.

Please provide the ad click URL, if possible:

Get latest updates about Open Source Projects, Conferences and News.

No, thanks