#34 KeyNotFoundException using TextExtractor

I use PDFClown to extract plain text from some PDF documents that I'm not allowed to share, unfortunately.
I hope the following information is enough to identify and fix the problem, though.
AssemblyVersion: 0.1.1
This is my high level code:

StringBuilder builder=new StringBuilder\(\);
using \(Stream input=new Stream\(ioStream\)\) \{
    using \(File inputFile=new File\(input\)\) \{
        TextExtractor extractor=new TextExtractor\(\);
        foreach \(var page in inputFile.Document.Pages\) \{

The following is the stack track of the exception:

System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
at org.pdfclown.documents.contents.fonts.SimpleFont.OnLoad()
at org.pdfclown.documents.contents.fonts.Font.Load()
at org.pdfclown.documents.contents.fonts.Font..ctor(PdfDirectObject baseObject)
at org.pdfclown.documents.contents.fonts.SimpleFont..ctor(PdfDirectObject baseObject)
at org.pdfclown.documents.contents.fonts.TrueTypeFont..ctor(PdfDirectObject baseObject)
at org.pdfclown.documents.contents.fonts.Font.Wrap(PdfDirectObject baseObject)
at org.pdfclown.documents.contents.FontResources.Wrap(PdfDirectObject baseObject)
at org.pdfclown.documents.contents.ResourceItems`1.get_Item(PdfName key)
at org.pdfclown.documents.contents.objects.SetFont.GetResource(IContentContext context)
at org.pdfclown.documents.contents.objects.SetFont.GetFont(IContentContext context)
at org.pdfclown.documents.contents.objects.SetFont.Scan(GraphicsState state)
at org.pdfclown.documents.contents.ContentScanner.MoveNext()
at org.pdfclown.documents.contents.ContentScanner.TextWrapper.Extract(ContentScanner level)
at org.pdfclown.documents.contents.ContentScanner.TextWrapper..ctor(ContentScanner scanner)
at org.pdfclown.documents.contents.ContentScanner.GraphicsObjectWrapper.Get(ContentScanner scanner)
at org.pdfclown.documents.contents.ContentScanner.get_CurrentWrapper()
at org.pdfclown.tools.TextExtractor.Extract(ContentScanner level, IList`1 extractedTextStrings)
at org.pdfclown.tools.TextExtractor.Extract(ContentScanner level, IList`1 extractedTextStrings)
at org.pdfclown.tools.TextExtractor.Extract(IContentContext contentContext)

I'm sorry that I cannot provide a sample PDF.
My current work-around in SimpleFont.OnLoad() looks like this, but I really don't know how correct that solution is:

if \(glyphWidth > 0\) \{
    int code;
    if \(codes.TryGetValue\(charCode, out code\)\) \{
        int idx;
        if \(glyphIndexes.TryGetValue\(code, out idx\)\) \{


  • Stefano Chizzolini

    • status: open --> pending
  • Stefano Chizzolini

    In order to properly solve your issue there's nothing but examine the actual cause of the missing glyph index; therefore the source document is, unfortunately, needed.

  • Comment has been marked as spam. 

    You can see all pending comments posted by this user  here

    Anonymous - 2012-05-08

    I have come across a bug similar to a previous bug posting. However, it seemed like the original poster did not provide a sample pdf file.

    I have a sample pdf file and cli output to hopefully help you fix the bug. PDF and CLI file are linked. Let me know if the links don't work.

    Also, I am creating a content tweaking application using pdf clown. My application closely follows the \"object\" model of the BasicTextExtraction sample because I am parsing through the ContentObject level. From my experience, after the KeyNotFoundException is thrown an IndexOutOfRange exception is thrown. If you try some fixes, I would be happy to try them out on my PDFs.

    pdf file: http://dl.dropbox.com/u/370470/Pages%20from%20Iraqs_WMD_Vol1.pdf
    cli output: http://dl.dropbox.com/u/370470/pdfclownCLI%20output.txt

  • Stefano Chizzolini

    • status: pending --> open
  • Stefano Chizzolini

    • assigned_to: nobody --> stechio
  • Stefano Chizzolini

    This issue has been fixed since version (see branch 0.1.2-Fix).

  • Stefano Chizzolini

    • status: open --> closed-out-of-date
    • Group: -->
    • Priority: 5 --> 3

Log in to post a comment.