From: <NKo...@gm...> - 2003-07-24 07:56:28
|
Hello Gilles, Steps of my installation: 1. Edit the htdig.conf configuration file to use the script: external_parsers: application/rtf->text/html /srv/www/htdocs/htdig/doc2html.pl \ text/rtf->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/pdf->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/postscript->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/msword->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/wordperfect5.1->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/msexcel->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/vnd.ms-excel->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/vnd.ms-powerpoint->text/html /srv/www/htdocs/htdig/doc2html.pl application/x-shockwave-flash->text/html /srv/www/htdocs/htdig/doc2html.pl \ application/x-shockwave-flash2-preview->text/html /srv/www/htdocs/htdig/doc2html.pl 2.In script doc2html.pl: my $PDF2HTML = '/srv/www/htdocs/htdig/pdf2html.pl'; 3.In script pdf2html.pl: my $PDFTOTEXT = "/srv/www/htdocs/xpdf-2.02pl1-linux/pdftotext"; my $PDFINFO = "/srv/www/htdocs/xpdf-2.02pl1-linux/pdfinfo"; WHAT IS FALSE??? Thank you! With Best Regards Natalya Kolesnikova > According to Natalya Kolesnikova: > > I'm trying to search in pdf-Files with htdig, but without success!! > > Anybody knows how to do it ???? > > See http://www.htdig.org/FAQ.html#q4.9 > > -- > Gilles R. Detillieux E-mail: <gr...@sc...> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ > Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) > -- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern! |
From: Gilles D. <gr...@sc...> - 2003-07-31 20:04:26
|
According to NKo...@gm...: > Steps of my installation: > > 1. Edit the htdig.conf configuration file to use the script: > external_parsers: application/rtf->text/html > /srv/www/htdocs/htdig/doc2html.pl \ > text/rtf->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/pdf->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/postscript->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/msword->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/wordperfect5.1->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/msexcel->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/vnd.ms-excel->text/html /srv/www/htdocs/htdig/doc2html.pl \ > application/vnd.ms-powerpoint->text/html /srv/www/htdocs/htdig/doc2html.pl > application/x-shockwave-flash->text/html /srv/www/htdocs/htdig/doc2html.pl > \ > application/x-shockwave-flash2-preview->text/html > /srv/www/htdocs/htdig/doc2html.pl Well, make sure that all the lines of the definition except the last one end with a backslash, and that there's no space after the backslash and before the newline character. Also, make sure each line ends with a newline character (or ASCII LF), and not just a carriage return (CR) as text files from old Mac OS versions do. I assume the first line above, and the last two, were folded by your mail program and that they actually aren't folded in your file the way they are above. But check to make sure. I don't see a backslash after the third to last entry (vnd.ms-powerpoint), so the two entries after that would be ignored. > 2.In script doc2html.pl: > my $PDF2HTML = '/srv/www/htdocs/htdig/pdf2html.pl'; > > 3.In script pdf2html.pl: > my $PDFTOTEXT = "/srv/www/htdocs/xpdf-2.02pl1-linux/pdftotext"; > my $PDFINFO = "/srv/www/htdocs/xpdf-2.02pl1-linux/pdfinfo"; > > WHAT IS FALSE??? Those definitions look fine to me, as long as the files are all where you say they are. You should then test doc2html.pl from the command line to make sure it works on one of your PDF files: /srv/www/htdocs/htdig/doc2html.pl /full/path/to/your/file.pdf \ application/pdf http://host/your/file.pdf If it does, then try indexing just a single PDF file through htdig -i -vvvv, setting start_url to the URL of that PDF file. > > According to Natalya Kolesnikova: > > > I'm trying to search in pdf-Files with htdig, but without success!! > > > Anybody knows how to do it ???? > > > > See http://www.htdig.org/FAQ.html#q4.9 -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) |