Docx2txt is a Perl based command-line tool to convert Microsoft docx documents to text files, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions.
- Consists of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file, with provision for maintaining separate system-wide configuration file and individual user-level configuration files.
- Perl script also works with input/output redirection, and is useful in viewing docx file content directly with editors like vim, emacs, and file browsers like mc (midnight commander).
- Can recover text from damaged docx documents in many cases (using CakeCMD kind of unzipping programs).
- Short line justifications, showing hyperlink and many character conversions (missing in MS text conversion).
- Handles (bullet, decimal, letter, roman) lists along with indentation.
- Installation via Makefiles and Windows batch file. On non-Windows systems scripts and configuration file can be installed in separate directories.
- Can conveniently be used to build a web based docx document conversion service.
Docx2txt works perfectly.
Very useful project!
This is an excellent extractor of text from docx files. If you use CakeCMD or No-Frills Command Unzipper to unzip the docx files, it will even extract text from corrupt docx files. This works well in a CGI script providing a text extraction web service of even corrupt docx files. See my instance at saveofficedata.com.