Getting Font information from TextExtract

  • Gully APC Burns

    Gully APC Burns - 2006-05-24

    Within a text extraction routine where I iterate over the nodes in a PDF docutment, I'd like to obtain the font information.

    If n is a leaf node.

    How can I find out the font of the text it contains?

    This is the very ugly hack I've tried to use, but it's horrible and doesn't get the appropriate font information for each node, only some of them.

    String font = "";
    Mark m = n.getSticky(0);
    if( m.getOwner() instanceof SpanPDF ) {
        SpanPDF span = (SpanPDF) m.getOwner();
        font = span.font.getName();
        font = font.substring(font.indexOf("+")+1, font.length();


    Do you have any suggestions, or ideas for places I could look to solve this?



    • Tom Phelps

      Tom Phelps - 2006-05-24

      The node.getSticky() holds span transitions and may cover many nodes or just part of one node.  You can get the multivalent.Context type for the document from the style sheet, traverse the document tree, invoke at each span transition context.reset(node, offset), then read the font attributes from the Context.  PDF uses the "spot font" field for embedded fonts.  PDF does not have overlapping font-related spans, in which case you can be somewhat simpler for this particular task.

    • Gully APC Burns

      Gully APC Burns - 2006-05-24

      OK, this looks somewhat doable. If I figure out a solution, I'll post it here.

    • Gully APC Burns

      Gully APC Burns - 2006-06-07

      OK, done, this works perfectly and was very easy to implement.

      Thanks enormously!

      The solution is spread out over many places in the code but is exactly as you suggested (so this is completely superfluous, but I said I'd post it so hey).

      StyleSheet ss = doc.getStyleSheet();
      Context context = ss.getContext();

      // Given a node n
      Context context.reset(n, 1);
      NFont f =;

      Thanks again



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks