When using CSTextExtractor to extract text, is there a way to determine if the glyphs is part of pdf-bookmark-text or pdf-comment-text?
What exactly do you mean by "pdf-bookmark-text" or "pdf-comment-text". Are these annotations?
If so, you can restrict the character extraction to the rectangle defined by the annotation using "setBounds".
I was looking at the following pdf-file:
extracting it I get the following text
In where the quotes are doubled and I though they were added as lables/anchors or comments inside the pdf. But I think the pdf are just prepared this way. Im sorry for the trouble.
Log in to post a comment.