SILVERCODERS DocToText

Extracts plain text from documents in all popular formats.

5.0 Stars (1)
27 Downloads (This Week)
Last Update:
Download doctotext-4.0-20140202.tar.bz2
Browse All Files
Windows BSD Mac Linux

Description

SILVERCODERS DocToText is a powerful utility that can convert documents to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications.

It supports MS Office binary formats: MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications): text documents (ODT), spreadsheets (ODS), presentations (ODP), Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML or MSOOXML) documents: MS Word (DOCX), MS Excel (XLSX), MS PowerPoint (PPTX), iWork, ODFXML (FODP, FODS, FODT), PDF, EML (emails) and HTML.

DocToText can be also used for searching, indexing archiving, as a fast console viewer or to recover text from corrupted documents. It can also extract text from annotations (comments) and read metadata like author, last modification date or number of pages.

SILVERCODERS DocToText Web Site

Update Notifications





User Ratings

★★★★★
★★★★
★★★
★★
1
0
0
0
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

  • socrtwo22
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    As a note of full disclosure, I sponsored some of the recent development of this application, especially regarding corrupt MS Office 2007 extracting capabilities. That being said, this is a fantastic command line text/data extractor from MS Office files. I do have experience with it extracting text from corrupt docx and pptx files where Word and PowerPoint 2007 themselves fail. Additionally I have experience with it extracting data from corrupt xlsx files. It may not be as effective in recovering text from corrupt doc Word 97-2003 files, Open Office or RTF ones, I only have limited experiences with these. They have yet to include features allowing recovering of data from Excel 97-2003 or text from PowerPoint 97 - 2003 corrupt or not. This software is a very effective command line extractor of text from doc, docx, pptx, odt, ods, odp and rtf non-corrupt files as well as data from xlsx files. Data from xlsx files are returned in tab format text files as opposed to the perhaps more common csv format. This app is very well suited as an easy building block or back end for powerful corrupt Office 2007 extracting GUI or web service and MS Office, Open Office and RTF non-corrupt converters. Again it is possible the program will work corrupt doc, odt, ods, odp and rtf files but I have limited experience with these.

    Posted 09/27/2009
Read more reviews

Additional Project Details

Intended Audience

Advanced End Users, Developers, End Users/Desktop

User Interface

Command-line

Programming Language

C, C++

Registered

2008-10-29
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.