From: Bill A. <bil...@em...> - 2002-07-22 19:47:13
|
Date: Mon, 22 Jul 2002 20:31:04 +0200 To: htd...@li... From: "htdig" <ht...@ac...> Subject: [htdig] pdf-files? >I have htdig running mostly as wanted. >It does not seem to index .pdf files, and I'm sure it has to do with my >lack of understanding. >The only thing I found in the FAQ was 'a too narrow max_size, whick is set >to 2000000, and my largest .pdf file is about 900000. Hi! I'll take a stab at this... If you are using the apache indexing, you need to make sure your largest document size exceeds the size of your largest dir (do a ls -l in dir above where your docs reside and add some for growth), not just your largest document. The reason is dig reads in the index but if it exceeds the largest size, it is trucated and only gets the docs up to that point. I had this problem before and that fixed it. >However I think it has something to do with a lack of any "PDF2TEXT" >conversionmodule I have to install ??? >Would anyone pse enlighten me what I have to do. > The environment is RH7.1, Apache and ht://Dig 3.2.0b4 Get the scripts listed below from http://htdig.org/contrib/ and put the following in your .conf file: external_parsers: application/rtf->text/html /usr/local/bin/doc2html.pl \ text/rtf->text/html /usr/local/bin/doc2html.pl \ application/pdf->text/html /usr/local/bin/doc2html.pl \ application/postscript->text/html /usr/local/bin/doc2html.pl \ You will need perl (http://www.activestate.com/Products/Download/Download.plex?id=ActivePerl) installed to use the files as well as xpdf (http://www.foolabs.com/xpdf/) >finn HTH! Bill Akins, CNE Sr. OSA Emory Healthcare (404) 712-2879 - Office 12674 - PIC bil...@em... ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ CONFIDENTIALITY NOTICE: This message may contain legally confidential and privileged information and is intended only for the named recipient(s). No one else is authorized to read, disseminate, distribute, copy, or otherwise disclose the contents of this message. If you have received this message in error, please notify the sender immediately by e-mail or telephone and delete the message in its entirety. Thank you. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ <<<<GWIASIG 0.06c>>>> |