#51 Inline image parsing

0.1.2.1
closed-fixed
None
1
2015-03-09
2013-07-13
No

Inline image parsing does not appear correct.

I have a document from a book scanning service (1dollarscan), which has some tiny inline images (I don't know how their pdfs are created, or what purpose these images serve). The pdf I'm working with is at http://www.ge.tt/7KoZpdl/v/0?c - I apologize for the huge (1.3GB) file; I tried extracting just the first page with the problem (index 36) but it didn't work because of these parsing problems.

Stepping through ContentParser.ParseInlineImage, I can see the headers read correctly ({W=1,H=1,IM=True,BPC=1}) but then there's a problem with parsing the body.

The body of these tiny images is always (decimal) 10,0,10,69,73. The current parser calls MoveNext, which consumes all the whitespace (10,0,10) and then parses the next token (69,73 = EI). It is then unable to find the end of the image body and consumes the rest of the file byte by byte until it hits an OutOfMemoryException.

I looked up the standard, and in 8.9.7 (pg 215) it states "Unless the image uses ASCIIHexDecode or ASCII85Decode as one of its filters, the ID operator shall be followed by a single white-space character, and the next character shall be interpreted as the first byte of image data." (I don't know anything about filters, so I assumed that doesn't apply here). With this in mind, the call to MoveNext() in ContentParser.cs:180 should probably be stream.ReadByte() instead.

There is also a problem with the end of the inline image body; the current parsing code will always include the 'E' of "EI" as part of the image data.

Since I am just discarding these images anyway, I just commented out the call to MoveNext as a workaround.

P.S. Great library! I've tried several others and concluded that yours is the most reliable and stable!

Discussion

  • Stefano Chizzolini

    • Group: v1.0_(example) --> 0.1.2.1
    • Priority: 3 --> 1
     
  • Stefano Chizzolini

    Fixed on 0.1.2-Fix branch (rev 153) and 0.2.0 trunk (rev 154)

     
  • Stefano Chizzolini

    • status: open --> closed-fixed
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks