From: Abbie G. <ag...@th...> - 2003-02-05 21:17:54
|
Hi all, I'm up and running, well sort of.=20 =20 I did install HTDig, and pointed it towards a folder with just .html files in it to test it...and voila got results. I actually need this to index pdf files though. =20 I have so far done the following: =20 Added to htdig.conf: external_parsers <http://www.htdig.org/attrs.html#external_parsers> : application/pdf->text/html /opt/www/htdig/bin/doc2html/doc2html.pl =20 Installed the xpdf rpm Installed the doc2html directory and scripts =20 Set the paths for the pdftotext and pdfinfo, as well as setting the path in doc2html.pl for the pdf2text.pl script =20 I checked the largest file size of a pdf and increased the max file size in htdig.conf as well. =20 I run .rundig -v and it indexes one html document that I have at the top level. All permissions on files are fine I actually set them to 777 to make sure it could get into the folders. But it doesn't want to index the pdfs...any ideas... =20 =20 I don't receive any error messages either. =20 My file setup is /archives/folder/folder...etc =20 I set htdig start_url at http://192.168.0.25/archives/ =20 I've tried moving a .pdf to the /archives file, but that doesn't work either. =20 Thanks! =20 Abbie =20 =20 =20 =20 |