Menu

#29 Extracting subscript char - Issue in some pdf

open
nobody
None
5
2009-05-05
2009-05-05
eclipse79
No

Hello,
I'm trying PDFBox 0.7.3 in order to extract text from PDF files, but I have noticed a problem on subscript chars. This issue occurs in the most PDF that I have (not in all). I have very often the word "CO2", where 2 is subscript char. Some files extract the text putting a CRLF before and after the "2".
These are some examples:

Inoltre, utilizzando unicamente combustibili fossili, il comparto non ha la possibilità di ridurre le
emissioni di CO
2
.

- la riduzione dell?impronta CO
2
complessiva,

Can anybody help me?
Thank you
Eclipse79

Discussion


Log in to post a comment.