Menu

#73 Object reference not set to an instance of an object

0.1.2.1
closed-fixed
None
2
2015-05-25
2015-05-14
No

Error when try to extract text from the attached PDF.

EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object.
   at org.pdfclown.documents.contents.fonts.Font.get_Flags()
   at org.pdfclown.documents.contents.fonts.SimpleFont.LoadEncoding()
   at org.pdfclown.documents.contents.fonts.SimpleFont.OnLoad()
   at org.pdfclown.documents.contents.fonts.Font.Load()
   at org.pdfclown.documents.contents.fonts.Font..ctor(PdfDirectObject baseObject)
   at org.pdfclown.documents.contents.fonts.SimpleFont..ctor(PdfDirectObject baseObject)
   at org.pdfclown.documents.contents.fonts.TrueTypeFont..ctor(PdfDirectObject baseObject)
   at org.pdfclown.documents.contents.fonts.Font.Wrap(PdfDirectObject baseObject)
   at org.pdfclown.documents.contents.FontResources.Wrap(PdfDirectObject baseObject)
   at org.pdfclown.documents.contents.ResourceItems`1.get_Item(PdfName key)
   at org.pdfclown.documents.contents.objects.SetFont.GetResource(IContentContext context)
   at org.pdfclown.documents.contents.objects.SetFont.GetFont(IContentContext context)
   at org.pdfclown.documents.contents.objects.SetFont.Scan(GraphicsState state)
   at org.pdfclown.documents.contents.ContentScanner.MoveNext()
   at org.pdfclown.documents.contents.ContentScanner.TextWrapper.Extract(ContentScanner level)
   at org.pdfclown.documents.contents.ContentScanner.TextWrapper..ctor(ContentScanner scanner)
   at org.pdfclown.documents.contents.ContentScanner.GraphicsObjectWrapper.Get(ContentScanner scanner)
   at org.pdfclown.documents.contents.ContentScanner.get_CurrentWrapper()
   at org.pdfclown.tools.TextExtractor.Extract(ContentScanner level, IList`1 extractedTextStrings)
   at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)
   at Digitaldoc.WebAPI.Services.Extractors.PdfToText.Extract()

Thanks for your attention

1 Attachments

Discussion

  • Stefano Chizzolini

    • status: open --> closed-fixed
    • assigned_to: Stefano Chizzolini
    • Priority: 5 --> 2
     
  • Stefano Chizzolini

    The attached document has fonts without font descriptors, despite the PDF spec 1.7 requires them. The fix introduces a more relaxed behavior which tolerates such spec violation.

    Fixed on 0.1.2-Fix branch (rev 216) and 0.2.0 trunk (rev 217).

    thank you

     

Log in to post a comment.

MongoDB Logo MongoDB