docx2txt Icon

docx2txt

Perl based utility to extract formatted text content from MS Docx file

5.0 Stars (3)
122 Downloads (This Week)
Last Update:
Download docx2txt-1.3.tgz
Browse All Files
Windows Mac Linux

Screenshots

Description

Docx2txt is a Perl based command-line tool to convert Microsoft docx documents to text files, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions.

docx2txt Web Site

Features

  • Consists of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file, with provision for maintaining separate system-wide configuration file and individual user-level configuration files.
  • Perl script also works with input/output redirection, and is useful in viewing docx file content directly with editors like vim, emacs, and file browsers like mc (midnight commander).
  • Can recover text from damaged docx documents in many cases (using CakeCMD kind of unzipping programs).
  • Short line justifications, showing hyperlink and many character conversions (missing in MS text conversion).
  • Handles (bullet, decimal, letter, roman) lists along with indentation.
  • Installation via Makefiles and Windows batch file. On non-Windows systems scripts and configuration file can be installed in separate directories.
  • Can conveniently be used to build a web based docx document conversion service.

Update Notifications





User Ratings

★★★★★
★★★★
★★★
★★
3
0
0
0
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 0 / 5
Write a Review

User Reviews

  • tfileme
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    Docx2txt works perfectly.

    Posted 05/13/2013
  • codeprese
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    Very useful project!

    Posted 06/01/2012
  • socrtwo22
    1 of 5 2 of 5 3 of 5 4 of 5 5 of 5

    This is an excellent extractor of text from docx files. If you use CakeCMD or No-Frills Command Unzipper to unzip the docx files, it will even extract text from corrupt docx files. This works well in a CGI script providing a text extraction web service of even corrupt docx files. See my instance at saveofficedata.com.

    Posted 09/24/2009
Read more reviews

Additional Project Details

Intended Audience

End Users/Desktop

User Interface

Command-line

Programming Language

Perl, Unix Shell

Registered

2008-07-29
Screenshots can attract more users to your project.
Features can attract more users to your project.

Icons must be PNG, GIF, or JPEG and less than 1 MiB in size. They will be displayed as 48x48 images.