I use CSCharacterParser when hilighting words in a custom pdfviewer im working on. I found that in seldom cases glyphAscent and glyphDescent returns 0. To fix/hack this I use the following in "basicTextShowGlyphs":
@Override
protected void basicTextShowGlyphs(PDGlyphs glyphs, float advance)
throws CSException {
AffineTransform tx;
tx = (AffineTransform) getDeviceTransform().clone();
tx.concatenate(textState.globalTransform);
lastStartX = tx.getTranslateX();
lastStartY = tx.getTranslateY();
// get the transformed character bounding box
double glyphAscent = glyphs.getAscent();
double glyphDescent = glyphs.getDescent();
double ascent = (textState.fontSize * glyphAscent) / THOUSAND;
double descent = (textState.fontSize * glyphDescent) / THOUSAND;
// hack begin
if (ascent==0.0 && descent == 0.0 ) {
ascent=(textState.fontSize * 3)/4;
descent=(textState.fontSize * 1)/4;
}
// hack end
if (descent > 0) {
descent = -descent;
}
double[] pts = new double[] { 0, descent, advance, ascent };
tx.deltaTransform(pts, 0, pts, 0, 2);
//
float x = (float) lastStartX;
float y = (float) (lastStartY + pts[1]);
float width = (float) pts[2];
float height = (float) (pts[3] - pts[1]);
if (width < 0) {
x += width;
width = -width;
}
if (height < 0) {
y += height;
height = -height;
}
Rectangle2D charRect = new Rectangle2D.Float(x, y, width, height);
if (getBounds() == null || getBounds().intersects(charRect)) {
onCharacterFound(glyphs, charRect);
}
// advance text matrix and store position for reference
super.basicTextShowGlyphs(glyphs, advance);
tx = (AffineTransform) getDeviceTransform().clone();
tx.concatenate(textState.globalTransform);
lastStopX = tx.getTranslateX();
lastStopY = tx.getTranslateY();
}
Please let me know if you have a better suggestion.
Best, Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This sounds reasonable as a workaround. But it may be better to go to the roots. So maybe you can send us a document where this possibly wrong ascent/descent is derived. If absolutely required, a workaround that computes a default may be better implemented in the PDFontDescriptor itself.
ciao, Michael
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, this doc has about 130 pages. You do not want me to scan through this, do you? A look at the first pages shows the use of three (standard) fonts, each of wich correctly evaluates to some non 0 ascent / descent.
Maybe you could narrow this down with a code snippet acting on some specific part....
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This page references two fonts, both of them with ascent and descent explicitly set to 0. So the creator gets what he deserves :-) Here's nothing you can workaround, except on the creator side.
If i remember correctly, the font descriptor should not even be contained, as it is derived from one of the basefonts. And if it is embedded, it should at least copy the correct values.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I use CSCharacterParser when hilighting words in a custom pdfviewer im working on. I found that in seldom cases glyphAscent and glyphDescent returns 0. To fix/hack this I use the following in "basicTextShowGlyphs":
Please let me know if you have a better suggestion.
Best, Peter
This sounds reasonable as a workaround. But it may be better to go to the roots. So maybe you can send us a document where this possibly wrong ascent/descent is derived. If absolutely required, a workaround that computes a default may be better implemented in the PDFontDescriptor itself.
ciao, Michael
In the following document glyphs.getAscent() and glyphs.getDescent() returns 0:
https://drive.google.com/file/d/0BxXDYY2kfIQ0cmFZNFh5VnMxR28/edit?usp=sharing
Best, Peter
Well, this doc has about 130 pages. You do not want me to scan through this, do you? A look at the first pages shows the use of three (standard) fonts, each of wich correctly evaluates to some non 0 ascent / descent.
Maybe you could narrow this down with a code snippet acting on some specific part....
Here is a single page for which all chars returns 0 on getAscent() and getDescent():
https://drive.google.com/file/d/0BxXDYY2kfIQ0c3VNeElNUlBIY1E/edit?usp=sharing
Im sorry my understanding of working with fonts in a pdf is very limited.
Best, Peter
This page references two fonts, both of them with ascent and descent explicitly set to 0. So the creator gets what he deserves :-) Here's nothing you can workaround, except on the creator side.
If i remember correctly, the font descriptor should not even be contained, as it is derived from one of the basefonts. And if it is embedded, it should at least copy the correct values.