PDF Clown / Bugs / #74 Input string was not in a correct format

#74 Input string was not in a correct format

Milestone: 0.1.2.1

Status: closed-fixed

Owner: Stefano Chizzolini

Labels: None

Priority: 2

Updated: 2015-05-23

Created: 2015-05-14

Creator: Willyan Klumb

Private: No

Hi, I get the following error when I try to extract the text from the attached PDF:

EXCEPTION: System.FormatException: Input string was not in a correct format.
   at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal)
   at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info)
   at System.Byte.Parse(String s, NumberStyles style, NumberFormatInfo info)
   at org.pdfclown.util.ConvertUtils.HexToByteArray(String data)
   at org.pdfclown.objects.PdfString.set_Value(Object value)
   at org.pdfclown.objects.PdfString..ctor(String value, SerializationModeEnum serializationMode)
   at org.pdfclown.objects.PdfByteString..ctor(String value)
   at org.pdfclown.documents.contents.tokens.ContentParser.ParsePdfObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject()
   at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects()
   at org.pdfclown.documents.contents.Contents.Load()
   at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext)
   at org.pdfclown.documents.Page.get_Contents()
   at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)

1 Attachments

Input string was not in a correct format.pdf

Discussion

Stefano Chizzolini - 2015-05-23

Inline images whose body contained the end-of-image token sequence (EI) were truncated. Now the fix detects the full end-of-image byte sequence (whitespace EI whitespace).

Fixed on 0.1.2-Fix branch (rev 214) and 0.2.0 trunk (rev 215).

thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-05-23

status: open --> closed-fixed

Priority: 5 --> 2
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Input string was not in a correct format

General-Purpose PDF Library for Java and .NET

Group

Searches

Help

#74 Input string was not in a correct format

Discussion