Extracting Words from PDF

MG Balaji
  • MG Balaji

    MG Balaji - 2009-02-16


    pdftohtml is a excellent tool. I have downloaded "pdftohtml-0.39-win32" version, and tried converting some pdfs into xml. It is extracting the words as line by line with its top, left, width and height informaion.

    But I want, to extract word by word with top, left, width and height info. Is it possible?. Can anyone tell how can i get this.


    • Matthew Potter

      Matthew Potter - 2009-08-18

      Have you heard anything or figured anything out regarding this? I know there is a tool "pdftoxml" which uses pdftohtml but I can't seem to compile it on any of my Macs. Pdftohtml works but it does line by line rather than word by word.


Log in to post a comment.