PDF Clown / Bugs / #71 Malformed dictionary

#71 Malformed dictionary

Milestone: 0.1.2.1

Status: closed-out-of-date

Owner: Stefano Chizzolini

Labels: None

Priority: 5

Updated: 2015-05-25

Created: 2015-05-14

Creator: Willyan Klumb

Private: No

Hi, I get the following error when I try to extract the text from PDF.
I'm sorry but I can't attach the problematic PDF, it has confidential information.

Error:

EXCEPTION: org.pdfclown.util.parsers.PostScriptParseException: Malformed dictionary.
   at org.pdfclown.util.parsers.PostScriptParser.MoveNext()
   at org.pdfclown.tokens.BaseParser.MoveNext()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.Contents.Load()
   at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.Page.get_Contents()
   at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)

Discussion

Stefano Chizzolini - 2015-05-21

Unfortunately this stack trace isn't sufficiently informative to infer the actual problem: could you send your PDF sample to my private mail (or through an encrypted channel if you have high concerns)? Alternatively, you may extract the relevant content in debug mode: when you intercept the PostScriptParseException, in the stack frame of PostScriptParser.MoveNext() do this:

1) read the position of the stream cursor:

stream.Position

2) execute the following code to extract the content chunk:

stream.Seek(0); stream.ReadString((int)stream.Length);

thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Willyan Klumb - 2015-05-25

Hi,
What is your private mail?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-05-25

You can find it here: http://pdfclown.org/contact-us/

IMPORTANT: please sync to the latest revision before debugging as I made some fix in the meantime.

thank you

Last edit: Stefano Chizzolini 2015-05-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Willyan Klumb - 2015-05-25

I've sent an email with the problematic PDF.

Last edit: Stefano Chizzolini 2015-05-25

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-05-25

(please DON'T post email addresses -- web crawlers harvest them to feed spammers' lists!)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-05-25

I've just tried your file, but text extraction worked regularly... did you test that file after syncing with the latest repository changes?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-05-25

status: open --> closed-out-of-date

assigned_to: Stefano Chizzolini
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Malformed dictionary

General-Purpose PDF Library for Java and .NET

Group

Searches

Help

#71 Malformed dictionary

Discussion