The "/dlvhex-sparqlplugin..lplugin-1.7.0.tar.gz" file could not be found or is not available. Please select another file.

docx2txt

6 Recommendations
120 Downloads (This Week)
Download docx2txt-1.2.tgz
Browse All Files

Description

Docx2txt is a Perl based command-line tool to convert Microsoft docx documents to (ASCII) text files, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions.

docx2txt Web Site

Features

  • Consists of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file, with provision for maintaining separate system-wide configuration file and individual user-level configuration files.
  • Perl script also works with input/output redirection, and is useful in viewing docx file content directly with editors like vim, emacs, and file browsers like mc (midnight commander).
  • Can recover text from damaged docx documents in many cases (using CakeCMD kind of unzipping programs).
  • Short line justifications, showing hyperlink and many character conversions (missing in MS text conversion).
  • Focus is on a good (ASCII) text experience.
  • Installation via Makefiles and Windows batch file. On non-Windows systems scripts and configuration file can be installed in separate directories.
  • Can conveniently be used to build a web based docx document conversion service.

User Ratings

 
 
6
4
Write a Review

User Reviews

  • Posted by socrtwo22 2009-09-24

    This is an excellent extractor of text from docx files. If you use CakeCMD or No-Frills Command Unzipper to unzip the docx files, it will even extract text from corrupt docx files. This works well in a CGI script providing a text extraction web service of even corrupt docx files. See my instance at saveofficedata.com.

Read more reviews

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.