Determine file type of CHMEntry

Status: Alpha

Brought to you by: le_yams

#1 Determine file type of CHMEntry

Status: open

Owner: Yann D'Isanto

Labels: None

Priority: 5

Updated: 2009-01-07

Created: 2008-08-02

Creator: Nam-Quang Tran

Private: No

Hi,

I'm using chm4j for indexing CHM files in a desktop search application (http://sourceforge.net/projects/docfetcher) and the library seems fairly decent so far despite its early stage. There's only this small problem: It seems there's no way to determine the file type of the CHMEntry objects, which is much needed for basic text extraction, because when extracting text, you would want to skip all the binary files, e.g. image files.
At the moment I'm working around this limitation by applying a regex pattern on the output (something like ".*<html>.*</html>.*") in order to determine whether it is an HTML file or something else, but I'd prefer a less "hackish" solution.

Best regards
Tran Nam Quang

Discussion

Yann D'Isanto - 2009-01-07

assigned_to: nobody --> le_yams
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Determine file type of CHMEntry

Group

Searches

Help

#1 Determine file type of CHMEntry

Discussion