From: Steve L. <sl...@lu...> - 2004-09-21 13:27:29
|
Hi All, Thank you all for your help. We took the easy way out for now by putting in the .pdf and .PDF in the bad_extensions line in the htdig.conf. We'll be going forward with making htdig reference the pdf files in the near future. Steve Lewis Manager of IT, QA and Manufacturing Lumeta Corporation sl...@lu... =20 732 357-3523 Voice 732 618-6006 Cell 866 213-5250 Pager =20 AIM creativerecords =20 -----Original Message----- From: Jim [mailto:li...@yg...]=20 Sent: Tuesday, September 21, 2004 3:07 AM To: Steve Lewis Cc: htd...@li... Subject: Re: [htdig] Acroread message On Fri, 17 Sep 2004, Steve Lewis wrote: > I'm new to HtDig and have one issue that is bothersome but not a big > problem. Everytime someone uses the search engine on our site I get a > message from our cron job as follows: > > PDF::parse: cannot find pdf parser /usr/local/bin/acroread It sounds like htdig is encountering some PDFs and trying to use the=20 default handling mechanism, which is failing due to acroread not being=20 found. If you really want to index the PDFs, you should probably start by=20 reading http://www.htdig.org/FAQ.html#q4.9. If you don't care about the=20 PDFs and just want to get rid of the message, it would probably be easiest=20 to just add .pdf to the bad_extensions attribute. http://www.htdig.org/attrs.html#bad_extensions Btw, I suspect what is happening is that you are getting the message you refer to each time cron tries to execute rundig. Not each time someone=20 uses the search engine on your site. The site search calls htsearch which=20 doesn't try to parse PDFs or do anything with cron. Jim |