Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Possible bug determining text dimensions

2011-08-12
2013-01-26
  • David Morse
    David Morse
    2011-08-12

    I have a question about a segment of code in org.pdfclown.documents.contents.objects.ShowText:

    /*
      NOTE: The text rendering matrix is recomputed before each glyph is painted
      during a text-showing operation.
    */
    Matrix trm = ctm.Clone(); trm.Multiply™;
    float charHeight = font.GetHeight(textChar,fontSize);
    drawing::RectangleF charBox = new drawing::RectangleF(
      trm.Elements,
      contextHeight - trm.Elements - font.GetAscent(fontSize) * tm.Elements,
      charWidth * tm.Elements,
      charHeight * tm.Elements
    );
    textScanner.ScanChar(textChar,charBox);

    This code multiplies the Current Transform Matrix by the Text Matrix to compute the Text Render Matrix. The charBox is then computed using a combination of the Text Render Matrix and the original Text Matrix. I would think that it should only be using the Text Render Matrix in its computations.

    In my PDF that I am using to extract text, the coordinates reported for text have a correct X, a slightly incorrect Y, and the width and height are way off (as compared to the output of iText and Foxit PDF Editor). If I modify this code to use the Text Render Matrix for all computations, i get results consistent with iText. In my example my Current Transform Matrix is { .03 0 0 .03 0 0 }.

    Am I simply misunderstanding how this should work or should this be changed?

    Thanks!
    BTW great library!

     
  • You were right: I fixed them on both trunk (rev 47) and 0.1.0 fix branch (rev 46).

    thank you for your bug report!
    Stefano