It would be nice if WordPerfect documents could be added to the list of document formats supported by DocFetcher. However, to the best of my knowledge there's currently no Java library for WordPerfect files, only a C/C++ library named 'libwpd', which, AFAIK, is used in AbiWord and OpenOffice.org and which therefore can be considered mature and stable enough.
In my opinion the best way to add WordPerfect support for DocFetcher is to write a JNI bridge to libwpd. A good tutorial for JNI is here:
http://java.sun.com/docs/books/jni/html/jniTOC.html
The first steps of this undertaking do not require integration into the DocFetcher source code, just a simple non-GUI Java program that extracts text from a WordPerfect document and writes the output to the console using System.out.println(..).
// Basic text extraction, something like this:
InputStream in = new FileInputStream(new File("someWPFile.wpd"));
WordPerfectDocument wpDoc = new WordPerfectDocument(in);
String text = wpDoc.getText();
// Extraction of meta data:
String title = wpDoc.getTitle();
String author = wpDoc.getAuthor();
String keywords = wpDoc.getKeywords();
In other words, there should be a "WordPerfectDocument" Java class with the following members:
Some additional requirements:
About the libwpd library:
It turned out that all needed components for this feature can be found here:
http://sourceforge.net/projects/libwpd
More precisely, we need the modules "libwpd2" and "libwpd2-bindings" from the CVS repository of that project. I've managed to build them on Linux (it was a horrible nightmare...) and now I could need some help with the Windows builds.
I managed to compile the libwpd2 with MSVC2008 (a horrible nightmare too).
So now I have these libs :
and these tools :
I have taken a look to libwpd2-bindings, it uses C# and Swig to build. I'm afraid it's beyond my skills (and time).
I saw you asked the libwpd team for a release, did you get it ?
Sorry for the painful experience :( I was hoping the compilation would be far less troublesome for an experienced C++ coder. I'll spare you the upcoming nightmare with libwpd2-bindings ;)
Please attach your files to this tracker if they aren't too big, , otherwise commit them to the /lib folder in the DocFetcher SVN folder.
And there was no sign of a libwpd release in the foreseeable future, so there seems to be no way around the compilation.
The two lib files have been commited in the /lib folder :