Input string was not in a correct format
General-Purpose PDF Library for Java and .NET
Status: Beta
Brought to you by:
stechio
Hi, I get the following error when I try to extract the text from the attached PDF:
EXCEPTION: System.FormatException: Input string was not in a correct format. at System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) at System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info) at System.Byte.Parse(String s, NumberStyles style, NumberFormatInfo info) at org.pdfclown.util.ConvertUtils.HexToByteArray(String data) at org.pdfclown.objects.PdfString.set_Value(Object value) at org.pdfclown.objects.PdfString..ctor(String value, SerializationModeEnum serializationMode) at org.pdfclown.objects.PdfByteString..ctor(String value) at org.pdfclown.documents.contents.tokens.ContentParser.ParsePdfObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseOperation() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObject() at org.pdfclown.documents.contents.tokens.ContentParser.ParseContentObjects() at org.pdfclown.documents.contents.Contents.Load() at org.pdfclown.documents.contents.Contents..ctor(PdfDirectObject baseObject, IContentContext contentContext) at org.pdfclown.documents.contents.Contents.Wrap(PdfDirectObject baseObject, IContentContext contentContext) at org.pdfclown.documents.Page.get_Contents() at org.pdfclown.documents.contents.ContentScanner..ctor(IContentContext contentContext) at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)
Inline images whose body contained the end-of-image token sequence (EI) were truncated. Now the fix detects the full end-of-image byte sequence (whitespace EI whitespace).
Fixed on 0.1.2-Fix branch (rev 214) and 0.2.0 trunk (rev 215).
thank you