Docx2Txt v1.0 released

This release focuses mainly on the user interaction aspects. Following new features have been added in this release.

1. Windows wrapper batch file similar to wrapper shell script, and support for using CakeCmd command line unzipper.

When using CakeCmd unzipper, batch file internally renames the .docx file to .zip file, unzips the content of this .zip file, extracts the document text content via perl script, and does the required cleanup and renaming back.

2. Input argument to core perl script and wrapper shell and batch scripts can also be a directory name, assuming that this directory holds the unzipped content of a .docx file.

This feature is useful for Windows users if they do not have a commandline unzipper like Unzip/CakeCmd. Also, since core perl script requires unzip with option to send the extracted file to stdout, this feature allows to have wrapper batch file using CakeCmd unzipper.

3. Configuration file support for easy control over settings like path to unzip utility, line width for short line justification etc..

4. Windows installation batch file that can automatically set needed paths in installed files depending upon whether it is supplied a valid path to perl utility during installation.

Following updations have been made in this release.

1. Hyperlink is not displayed if hyperlink and hyperlinked text are same, even though user has enabled hyperlink display via configuration. This is to avoid unnecessary duplication of content in equivalent text representation of .docx document.

2. Improved handling of short line justification that captures many cases that were missed out in earlier approach.

3. Earlier versions did not handle path names containing spaces, this issue is fixed in this version.

Please refer to the updated README and INSTALL documentation for more details.

Posted by Sandeep Kumar 2009-10-05

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks