pdftohtml ... How to install ?

  • targethebadone


    At first … I’m french, and my english langage is bad, so... thanks for your patience.

    J’ve donwnloaded pdftohtml-0.36-win32.zip, this soft and others versions ... I can’t see “something.EXE”, o “something.setup”. My système (WindowsXP pro) doesn’t understand what sort of files I downloaded. So, I can’t install this soft. Do you know why ? Maybe I have to donwnload something else ... But what ? Where ? This soft is really “open source” ?

    Thanks a lot (if you can read my post ...)


    • Go to https://sourceforge.net/project/showfiles.php?group_id=45839&package_id=40578&release_id=436795 and download the Windows 32 binary for version 0.39. There is NO setup, you should run the command prompt from Windows Start Menu (you can find it in "Accessories"), enter there:
      "C:\path\to\pdftohtm.exe" "C:\path\to\PDFfile.pdf"
      and press "Enter" to start the conversion. Or you can just drag and drop the source PDF file to the exe file in Windows Explorer - the target HTML file will be generated in the same folder as your source PDF.

    • targethebadone

      Well ... thanks a lot, really ...

      I'll see your procedure tomorrow morning


    • DaveRado

      Hi Andrey

      I just tried that procedure, but it didn't preserve the layout at all when I tried it - not even close - and also it created one huge html page from a 180 page PDF document; whereas according to the demo at http://pdftohtml.sourceforge.net/, it should preserve the layout more or less perfectly, and it should create a separate html page for each pdf page. What am I missing?


      • Just read this (copied from the program help) and put the appropriate options in command string, e.g. -c will create a COMPLEX (i.e. multipage) HTML document.

        Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
        -f <int>          : first page to convert
        -l <int>          : last page to convert
        -q                : don't print any messages or errors
        -h                : print usage information
        -help             : print usage information
        -p                : exchange .pdf links by .html
        -c                : generate complex document
        -i                : ignore images
        -noframes         : generate no frames
        -stdout           : use standard output
        -zoom <fp>        : zoom the pdf document (default 1.5)
        -xml              : output for XML post-processing
        -hidden           : output hidden text
        -nomerge          : do not merge paragraphs
        -enc <string>     : output text encoding name
        -dev <string>     : output device name for Ghostscript (png16m, jpeg etc)
        -v                : print copyright and version info
        -opw <string>     : owner password (for encrypted files)
        -upw <string>     : user password (for encrypted files)

    • yuvika yuvika
      yuvika yuvika

      I have downloaded the file "pdftohtml-0.36-win32.zip" now how to use it???.

      In all other forums the idea i got is to use exe file.But this folder does nto contain any exe file.Bsically i want to convert the pdf file uploaded in my asp.net appliaction to hmtl format.SO if anyone has done it or could help me.Please reply as soon as possible.


    • Blade12775

      Unzip the contents of the file to a folder on your hard drive. We assume now that the folder is on drive D: and called "pdftohtml" (e.g. on d:\pdftohtml). You can put the program in any directory you like, but in this example it's, d:\pdftohtml

      Now copy a .pdf file to that directory (e.g. house.pdf -> remember, use short names!).

      Go to Start->Execute.
      Type cmd and press OK.
      The commandline interpreter opens.
      Type d: [Enter]. Type cd pdftohtml [Enter]. Then type pdftohtml house.pdf [Enter].
      Now the PDFtoHTML creator builds a html file structure out of the PDF file.

      You will notice that the directory has three new files: house.html, house_ind.html and houses.html. house.html is the frame file for the frameset. house_ind.html is the index file with links to certain pages of the file containing the text - houses.html.

      Copy those three files over to your PSP and open the file file:/house.html

      Now you see the PDF text on the right side and a navigation bar with page numbers on the left. If you only want to see the text part, open file:/houses.html