I suggest that in a future release, there be the option to output a specified page from the PDF file (complex form) to stdout.
pdftohtml -stdout -page int -image|-html "file.pdf"
-stdout specifies the output
-page int specifies the page number
-image|-html specifies whether to output the png image or the html for that page
This could be used for HTTP to output one page at a time based upon a request.
pdftohtml version 0.39 http://pdftohtml.sourceforge.net/, based on Xpdf version 3.00
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2004 Glyph & Cog, LLC
Usage: pdftohtml [options] <PDF-file> [<html-file> <xml-file>]
-f <int> : first page to convert
-l <int> : last page to convert
-q : don't print any messages or errors
-h : print usage information
-help : print usage information
-p : exchange .pdf links by .html
-c : generate complex document
-i : ignore images
-noframes : generate no frames
-stdout : use standard output
-zoom <fp> : zoom the pdf document (default 1.5)
-xml : output for XML post-processing
-hidden : output hidden text
-nomerge : do not merge paragraphs
-enc <string> : output text encoding name
-dev <string> : output device name for Ghostscript (png16m, jpeg etc)
-v : print copyright and version info
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
If you want to extract only one page, for example the page 5, than you specify it in options as below.
pdftohtml -f 5 -l 5 -c -stdout Sample.pdf Sample.html
Adding a -c option causes an exception though. The program tells that it cannot find gswin32c - (Ghostscript).
More importantly, though, is the ability to extract png images to stdout.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.