## #32 PDF module error with TeX-created documents

None
pending-fixed
5
2016-09-07
2012-02-28
Gary McGath
No

User Chris Yocum reports:

Anyway, here is the output that I am getting. You can try this on any TeX generated document and it should give you the same results.

java.lang.ClassCastException:
edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to
edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
at
Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
at Jhove.main(Unknown Source)

## Discussion

• Gary McGath - 2012-09-05

Could you attach a file that exhibits this problem?

• Gary McGath - 2012-11-09

Does this work for you with JHOVE 1.8?

• Gary McGath - 2012-11-09
• status: open --> pending

• Thomas Fischer - 2013-03-04

I can confirm this bug, although the file is not TeX-generated, but from Acrobat Distiller. The file is attached.
Here is my complete output:

Jhove (Rel. 1.9, 2012-12-17)
Date: 2013-03-04 13:59:26 CET
ReportingModule: PDF-hul, Rel. 1.7 (2012-08-12)
LastModified: 2013-01-04 12:22:13 CET
Size: 80219
Format: PDF
Version: 1.6
Status: Not well-formed
SignatureMatches:
PDF-hul
ErrorMessage: Unexpected error in findFonts: java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
Offset: 1849
MIMEtype: application/pdf
Objects: 0
FreeObjects: 1
DocumentCatalog:
PageLayout: SinglePage
PageMode: UseNone
Filters:
FilterPipeline: FlateDecode
Fonts:
TrueType:
Font:
BaseFont: CBMFOF+Garamond
FontSubset: true
FirstChar: 32
LastChar: 246
FontDescriptor:
FontName: CBMFOF+Garamond
Flags: Serif, Nonsymbolic
FontBBox: -139, -307, 1063, 986
FontFile2: true
Encoding: WinAnsiEncoding
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>Bolagsverket</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">Produktbeskrivning P25_Personinformation</rdf:li>
</rdf:Alt>
</dc:title>
</rdf:Description>
<xmp:CreateDate>2008-10-13T15:55:07+02:00</xmp:CreateDate>
<xmp:CreatorTool>PScript5.dll Version 5.2.2</xmp:CreatorTool>
<xmp:ModifyDate>2012-08-17T15:56:07+02:00</xmp:ModifyDate>
</rdf:Description>
<pdf:Producer>Acrobat Distiller 8.1.0 (Windows)</pdf:Producer>
</rdf:Description>
<xmpMM:InstanceID>uuid:dde7d516-b11d-4d86-be2a-5cc56c489a1d</xmpMM:InstanceID>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
Pages:
Page:
Label: 1
Page:
Label: 2
Page:
Label: 3
Page:
Label: 4
Page:
Label: 5
Page:
Label: 6
Page:
Label: 7


• Gary McGath - 2013-03-04

JHOVE is getting caught because it's seeing a keyword where it expects a font dictionary in a page node's resources. As far as I can tell from reading the spec, this is incorrect PDF. I've fixed it so that instead of throwing an exception it reports that it failed to see a font dictionary. This is in the checked-in PdfModule.java.

This seems to imply that many TeX-generated PDFs are broken. If there's something I've missed and a keyword object is valid in this context, please let me know. At least now the error message is more to the point, and there won't be a stack dump.

• Gary McGath - 2013-03-04
• status: pending --> pending-fixed
• milestone: -->

• Thomas Fischer - 2013-06-05

The fix doesn't seem to cover all cases. I was able to create a PDF file using pdfLaTeX which recreates the crash in 1.10b2. The crash is triggered as soon as I include the MinionPro font (i.e. commenting the MinionPro package makes jHove run ok):

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[lf]{MinionPro}

\begin{document}
ABC
\end{document}

The output looks like this:

java.lang.ClassCastException: edu.harvard.hul.ois.jhove.module.pdf.PdfSimpleObject cannot be cast to edu.harvard.hul.ois.jhove.module.pdf.PdfDictionary
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
at Jhove.main(Unknown Source)
Jhove (Rel. 1.9, 2013-05-28)
Date: 2013-06-05 10:08:04 CEST
RepresentationInformation: /tmp/test.pdf
ReportingModule: PDF-hul, Rel. 1.7 (2012-08-12)
LastModified: 2013-06-05 10:00:09 CEST
Size: 42554
Format: PDF
Status: Not well-formed
SignatureMatches:
PDF-hul
ErrorMessage: No document catalog dictionary
Offset: 0
MIMEtype: application/pdf

BTW, both the version from CVS and the tar-ball report version number 1.9 instead of 1.10b2 or something else.

• Gary McGath - 2013-06-05

Re Thomas Fischer: I'm not getting a crash, and it looks from the output you've posted as if JHOVE is in fact running to completion after writing out a stack dump. However, JHOVE isn't processing the file properly, or else it's broken and Acrobat is able to open it anyway. (This may hinge on fine points of what "broken" means.) I'm seeing that in trying to read the document catalog dictionary, JHOVE is instead getting a keyword of "rstChar". This is most likely a fragment of a "FirstChar" keyword.

There is legitimately a bug, but I'm afraid it will have to stay open for version 1.10. Hopefully I or someone else will find a fix for it later.

• Denis Bitouzé - 2013-11-02

There is legitimately a bug, but I'm afraid it will have to stay open for version 1.10. Hopefully I or someone else will find a fix for it later.

Hi,

is this bug still present in current version of JHOVE 1.11?

Best regards.

• Carl Wilson - 2016-09-07

Moved to GitHub for triage and testing