[pdftohtml] output is (mostly) nonsense
Status: Beta
Brought to you by:
meshko
|
From: Kent R. <ken...@si...> - 2006-10-04 06:55:35
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 This is a resend. Since writing it, I found your archives, and I now see why you don't allow non-member posting. I assume you'll get through all that spam to find the real posts some time in the next couple years... in the mean time, I'll post this as a member. - -- Hi, Please excuse me if there is an archive for this list; I couldn't find one or links to one on http://pdftohtml.sourceforge.net/. I'm using pdftohtml for the first time, and having looked through the man page, and tried many different configurations of command line options, I'm getting nothing like the pdf document. The html is fine, index and links and all, but the content of the pages looks like the following (I have a screenshot I could send, if that would help): ! " # $ $% ! ! !!&$" $! &$" $! & $ " ! $ $ ' $$ Every once in a while (almost once per page, but not quite), there is a line, and sometimes a paragraph, of text from the pdf. The number of pages is correct. I tried with -enc UTF-8, but it looks like there isn't a switch for input encoding, if I felt adventurous enough to play with that. Anyway, I'm assuming there is something straightforward that I'm missing, but I'm not sure what, and I haven't found this discussed. btw, I'm running Ubuntu 6.06. - -- Kent Rasmussen SIL Eastern Congo Group Linguist 020 608593/4/5 x130 0733-710235(office) 0722-620510(office) 0735-539687(Personal) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iD8DBQFFI1p5c7tUjlKyxNMRAui6AKCQAomj+1Z0KUSwD+GmytDBsHQGpwCgmYQ1 7R8iDG2q8Hi4DJ8OS48lF7s= =1Lo6 -----END PGP SIGNATURE----- |