Menu

#74 Input string was not in a correct format

0.1.2.1
closed-fixed
None
2
2015-05-23
2015-05-14
No

Hi, I get the following error when I try to extract the text from the attached PDF:

EXCEPTION: System.FormatException: Input string was not in a correct format.
   at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
   at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
   at System.Byte.Parse(String s, NumberStyles style, NumberFormatInfo info)
   at org.pdfclown.util.ConvertUtils.HexToByteArray(String data)
   at org.pdfclown.objects.PdfString.set_Value(Object value)
   at org.pdfclown.objects.PdfString..ctor(String value, SerializationModeEnum serializationMode)
   at org.pdfclown.objects.PdfByteString..ctor(String value)
   at org.pdfclown.documents.contents.tokens.ContentParser.ParsePdfObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.Contents.Load()
   at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.Page.get_Contents()
   at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)
1 Attachments

Discussion

  • Stefano Chizzolini

    Inline images whose body contained the end-of-image token sequence (EI) were truncated. Now the fix detects the full end-of-image byte sequence (whitespace EI whitespace).

    Fixed on 0.1.2-Fix branch (rev 214) and 0.2.0 trunk (rev 215).

    thank you

     
  • Stefano Chizzolini

    • status: open --> closed-fixed
    • Priority: 5 --> 2
     

Log in to post a comment.

MongoDB Logo MongoDB