From: Tim C. <cl...@de...> - 2004-05-18 15:59:37
|
Doug: My understanding is that the command line magic that you need to pass to pdftotext directly is a bit too cumbersome to fit into htdig.conf. Also, from a memory standpoint, my experience with this was it was better for the pdftotext program to write to a text/html temp file and then have htdig read off of this vs. an "in-line" conversion and passthrough which caused repeated failures for me. If you would just like to try something different than doc2html.ph, I can recommend a script which was written by Stefan Nehlsen (I think, I believe that Martin Allert gave me a copy which he may well have made a lot of changes to) called parsepdf.pl which I have found effective. I think that it is in the contributed works section although I should probably make sure that is true. Good Luck, Tim |