I can send anyone the docs I have as examples to help debug if I am given an address that will accept attachments.
 
I have the ppt, xls, and pfd files and would like to provide any help I can.
 
From: Christiaan Fluit <christiaan.fluit@ad...> - 2007-01-22 05:44
Kevin C. Bombardier wrote:
> INFO: regular POI-based processing failed, falling back to heuristic
> string extraction for file:/C:/eclipse/workspace/test/digvlsideslec1.ppt

This I explained in my previous mail, apparently Apache POI failed to
process the document properly. There is little we can do other than
contacting the POI developers and giving them this example document.

> *******************************************************
> No core record found with ID 6 based on PersistPtr lookup
> No core record found with ID 19 based on PersistPtr lookup
> No core record found with ID 20 based on PersistPtr lookup
>
> *******************************************************

Some logging info that was printed out by POI, I suspect. Apparently
they print to System.out or System.err rather than using a logging
framework. Not sure whether this says anything about the quality of the
extraction results though. Feel free to enlight us :)

> INFO: regular POI-based processing failed, falling back to heuristic
> string extraction for file:/C:/eclipse/workspace/test/dsmlecture2.ppt

See above.

> ***************************************************************
> Warning: FontCollection child wasn't a FontEntityAtom, was
> org.apache.poi.hslf.record.UnknownRecordPlaceholder@139c41d
> <mailto:org.apache.poi.hslf.record.UnknownRecordPlaceholder@139c41d>
> Warning: FontCollection child wasn't a FontEntityAtom, was
> org.apache.poi.hslf.record.UnknownRecordPlaceholder@17a6a4b
> <mailto:org.apache.poi.hslf.record.UnknownRecordPlaceholder@17a6a4b>
> Warning: FontCollection child wasn't a FontEntityAtom, was
> org.apache.poi.hslf.record.UnknownRecordPlaceholder@1d889aa
> <mailto:org.apache.poi.hslf.record.UnknownRecordPlaceholder@1d889aa>
>
> ***************************************************************

POI again, see above.

> WARNING: ExtractorException while processing
> file:/C:/eclipse/workspace/test/jlm-dpc-oasis.pdf
> org.semanticdesktop.aperture.extractor.ExtractorException:
> java.io.IOException: Bad Dictionary Declaration
> org.pdfbox.io.PushBackInputStream@7a7686

This time it's PDFBox, the library used for extracting PDF text and
metadata that complains about an invalid document structure. Does the
output seem correct when you inspect the file using the File Inspector?
If not, it may be worthwhile to contact Ben Litchfield, the PDFBox
maintainer, and show him this document. See pdfbox.org for details.


Regards,

Chris
--


The fish are biting.
Get more visitors on your site using Yahoo! Search Marketing.