Hi,
I'm using chm4j for indexing CHM files in a desktop search application (http://sourceforge.net/projects/docfetcher) and the library seems fairly decent so far despite its early stage. There's only this small problem: It seems there's no way to determine the file type of the CHMEntry objects, which is much needed for basic text extraction, because when extracting text, you would want to skip all the binary files, e.g. image files.
At the moment I'm working around this limitation by applying a regex pattern on the output (something like ".*<html>.*</html>.*") in order to determine whether it is an HTML file or something else, but I'd prefer a less "hackish" solution.
Best regards
Tran Nam Quang