ayyy - 2008-07-26

Tips For Easy Use of PDFtoHTML

1) Good: Just drag PDF file to pdftohtml.exe or to a shortcut to pdftohtml.exe.
2) Better: Add options (switches) to your shortcut: Right Click on shortcut >> Properties >> In Target have something like “E:\yourpath\PDFtoHTML\pdftohtml.exe -c -noframes -zoom 2.0”.
3) Best: Copy this shortcut to your SendTo folder: “C:\Profiles\Administrator\SendTo”.  Then just Right Click any PDF file >> Send To PDFtoHTML >> and your HTML will be created in same folder as PDF.

Also, could the experts here advise on 2 things:
1) How to get close to the original format “-c -noframes” AND get the JPEG files created as well.   I've pasted the four files from the ghostscript bin directory into my pdftohtml directory (after ghostscript install):
gsdll32.dll, gsdll32.lib, gswin32.exe, gswin32c.exe
but this created additional page image files and was a larger total for file size.  Just trying to build minimal formatted HTML with images and had no luck from several attempted option combinations.
2) Also, how do we compile the source for 0.40 to get the windows exe?  Is there an app (something like MinGW) that is required to do this?  Please point in the right direction.


Here are the options in case anyone wants to comment on them:
Example: pdftohtml.exe -c -noframes -zoom 2.0
-f <int>        : first page to convert
-l <int>        : last page to convert
-q                : don't print any messages or errors
-h                : print usage information
-help           : print usage information
-p                : exchange .pdf links by .html
-c                : generate complex document
    Best format duplication.
-i                 : ignore images
-noframes      : generate no frames
    No index frame, just a single HTML file.
-stdout           : use standard output
    Extracts images only.
-zoom <fp>    : zoom the pdf document (default 1.5) 
    The min is 0.5 and max is 3.0.
-xml                : output for XML post-processing
-hidden           : output hidden text
-nomerge          : do not merge paragraphs
    Seems to be the same as no switches at all.
-enc <string>     : output text encoding name
-dev <string>     : output device name for Ghostscript (png16m, jpeg etc)
-v                       : print copyright and version info
-opw <string>     : owner password (for encrypted files)
-upw <string>     : user password (for encrypted files)