Menu

#50 run out of java heap space while JHOVE process a specific pdf against tag profiles

open
nobody
None
5
2016-09-07
2013-02-06
Carol
No

While JHOVE processes this pdf, http://www.fcla.edu/daitss-test/files/01471-213X-12-33-S2.pdf, it runs out of all JAVA heap space. Is there an infinite loop during tag profile checking?

./jhove -c conf/jhove.conf -m pdf-hul ~/Workspace/describe/01471-213X-12-33-S2.pdf
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.(StringBuilder.java:68)
at edu.harvard.hul.ois.jhove.module.pdf.Tokenizer.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.getNext(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readArray(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readDictionary(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.Parser.readObjectDef(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.getObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.resolveIndirectObject(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.isStructElem(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureElement.buildSubtree(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.getChildren(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.StructureTree.(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.TaggedProfile.satisfiesThisProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.pdf.PdfProfile.satisfiesProfile(Unknown Source)
at edu.harvard.hul.ois.jhove.module.PdfModule.parse(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.processFile(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.process(Unknown Source)
at edu.harvard.hul.ois.jhove.JhoveBase.dispatch(Unknown Source)

Discussion

  • Gary McGath

    Gary McGath - 2013-04-29

    I'm working on this issue. One user showed that with huge amounts of patience and memory, at least some PDF files that appear to be in an infinite loop are completed after several hours. The StructureTree object can take a huge amount of memory for some files, but once it's build only a couple of flags that were set during its construction are checked. This suggests that the whole tree doesn't have to be in memory at once. I hope to have a fix that takes this into account before too long.

    Thanks for the additional example. I'll use it in testing.

     

    Last edit: Gary McGath 2013-04-29
  • Stefan Hein

    Stefan Hein - 2013-04-29

    It's nice to hear that. Many thanks in advance.

     
  • Stefan Hein

    Stefan Hein - 2013-08-20

    Unfortunately this reported bug is still existing in JHOVE 1.10.

     
  • Carl Wilson

    Carl Wilson - 2016-09-07

    Moved to GitHub for triage and testing.

     

Log in to post a comment.