Extracting subscript char - Issue in some pdf
Brought to you by:
benlitchfield
Hello,
I'm trying PDFBox 0.7.3 in order to extract text from PDF files, but I have noticed a problem on subscript chars. This issue occurs in the most PDF that I have (not in all). I have very often the word "CO2", where 2 is subscript char. Some files extract the text putting a CRLF before and after the "2".
These are some examples:
Inoltre, utilizzando unicamente combustibili fossili, il comparto non ha la possibilità di ridurre le
emissioni di CO
2
.
- la riduzione dell?impronta CO
2
complessiva,
Can anybody help me?
Thank you
Eclipse79