Malformed dictionary
General-Purpose PDF Library for Java and .NET
Status: Beta
Brought to you by:
stechio
Hi, I get the following error when I try to extract the text from PDF.
I'm sorry but I can't attach the problematic PDF, it has confidential information.
Error:
EXCEPTION: org.pdfclown.util.parsers.PostScriptParseException: Malformed dictionary. at org.pdfclown.util.parsers.PostScriptParser.MoveNext() at org.pdfclown.tokens.BaseParser.MoveNext() at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.Contents.Load() at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext) at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext) at org.pdfclown.documents.Page.get_Contents() at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext) at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)
Unfortunately this stack trace isn't sufficiently informative to infer the actual problem: could you send your PDF sample to my private mail (or through an encrypted channel if you have high concerns)? Alternatively, you may extract the relevant content in debug mode: when you intercept the PostScriptParseException, in the stack frame of PostScriptParser.MoveNext() do this:
1) read the position of the stream cursor:
2) execute the following code to extract the content chunk:
thank you
Hi,
What is your private mail?
You can find it here: http://pdfclown.org/contact-us/
IMPORTANT: please sync to the latest revision before debugging as I made some fix in the meantime.
thank you
Last edit: Stefano Chizzolini 2015-05-25
I've sent an email with the problematic PDF.
Last edit: Stefano Chizzolini 2015-05-25
(please DON'T post email addresses -- web crawlers harvest them to feed spammers' lists!)
I've just tried your file, but text extraction worked regularly... did you test that file after syncing with the latest repository changes?