While JHOVE processes this pdf, http://www.fcla.edu/daitss-test/files/01471-213X-12-33-S2.pdf, it runs out of all JAVA heap space. Is there an infinite loop during tag profile checking?
./jhove -c conf/jhove.conf -m pdf-hul ~/Workspace/describe/01471-213X-12-33-S2.pdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.(StringBuilder.java:68)
at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.isStructElem(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.getChildren(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.TaggedProfile.satisfiesThisProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.PdfProfile.satisfiesProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)
I have the same problem with the following pdf file:
http://docserv.uni-duesseldorf.de/servlets/DerivateServlet/Derivate-25614
Are there any updates to that issue?
I'm working on this issue. One user showed that with huge amounts of patience and memory, at least some PDF files that appear to be in an infinite loop are completed after several hours. The StructureTree object can take a huge amount of memory for some files, but once it's build only a couple of flags that were set during its construction are checked. This suggests that the whole tree doesn't have to be in memory at once. I hope to have a fix that takes this into account before too long.
Thanks for the additional example. I'll use it in testing.
Last edit: Gary McGath 2013-04-29
It's nice to hear that. Many thanks in advance.
Unfortunately this reported bug is still existing in JHOVE 1.10.
Moved to GitHub for triage and testing.