Menu

#71 Malformed dictionary

0.1.2.1
closed-out-of-date
None
5
2015-05-25
2015-05-14
No

Hi, I get the following error when I try to extract the text from PDF.
I'm sorry but I can't attach the problematic PDF, it has confidential information.

Error:

EXCEPTION: org.pdfclown.util.parsers.PostScriptParseException: Malformed dictionary.
   at org.pdfclown.util.parsers.PostScriptParser.MoveNext()
   at org.pdfclown.tokens.BaseParser.MoveNext()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.Contents.Load()
   at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.Page.get_Contents()
   at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)

Discussion

  • Stefano Chizzolini

    Unfortunately this stack trace isn't sufficiently informative to infer the actual problem: could you send your PDF sample to my private mail (or through an encrypted channel if you have high concerns)? Alternatively, you may extract the relevant content in debug mode: when you intercept the PostScriptParseException, in the stack frame of PostScriptParser.MoveNext() do this:

    1) read the position of the stream cursor:

    stream.Position
    

    2) execute the following code to extract the content chunk:

    stream.Seek(0);
    stream.ReadString((int)stream.Length);
    

    thank you

     
  • Willyan Klumb

    Willyan Klumb - 2015-05-25

    Hi,
    What is your private mail?

     
  • Stefano Chizzolini

    You can find it here: http://pdfclown.org/contact-us/

    IMPORTANT: please sync to the latest revision before debugging as I made some fix in the meantime.

    thank you

     

    Last edit: Stefano Chizzolini 2015-05-25
  • Willyan Klumb

    Willyan Klumb - 2015-05-25

    I've sent an email with the problematic PDF.

     

    Last edit: Stefano Chizzolini 2015-05-25
  • Stefano Chizzolini

    (please DON'T post email addresses -- web crawlers harvest them to feed spammers' lists!)

     
  • Stefano Chizzolini

    I've just tried your file, but text extraction worked regularly... did you test that file after syncing with the latest repository changes?

     
  • Stefano Chizzolini

    • status: open --> closed-out-of-date
    • assigned_to: Stefano Chizzolini
     

Log in to post a comment.

MongoDB Logo MongoDB