I'm trying PDFBox 0.7.3 in order to extract text from PDF files, but I have noticed a problem on subscript chars. This issue occurs in the most PDF that I have (not in all). I have very often the word "CO2", where 2 is subscript char. Some files extract the text putting a CRLF before and after the "2".
These are some examples:
Inoltre, utilizzando unicamente combustibili fossili, il comparto non ha la possibilità di ridurre le
emissioni di CO
- la riduzione dell?impronta CO
Can anybody help me?