pdf2xml convertor based on Xpdf library (http://www.foolabs.com/xpdf/home.html). It converts information contained in a PDF file into XML. First, you need to install xpdf and libxml2 (see documentation). Hervé Déjean Xerox Research Centre Europe
- pdf to xml conversion
- text extraction
- vectorial instruction extraction
Used on the irs f1040.pdf to produce f1040.xml; however, when viewed in firefox, firefox indicated it had no styling; hence, it didn't look anything like the pdf file when viewed by adobe reader.