Re: [htdig-dev] PDF Indexing problem

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Greetings Abbie,

If you look a bit earlier on in the output, does it say something like=20
"! UNABLE to convert" or
"!!      Unable to execute pdftotext at /.../pdf2html.pl line 34."?

If it says the first, you'll have to edit your file=20
/opt/www/htdig/bin/doc2html/doc2html.pl -- line 77 should be=20
something like
my $PDF2HTML =3D '/.../pdf2html.pl'; # full pathname of pdf2html/pl=20
script
where '/.../pdf2html.pl' is the path to your pdf2html.pl script. =20
(Type 'which pdf2html.pl' to find it.)

If it says the second, you'll have to edit your  pdf2html.pl  file. =20
At about line 20, should be the line
my $PDFTOTEXT =3D "/usr/bin/pdftotext";
(Replace the path with your own path, from 'which pdftotext'.)

Regarding highlighting words, ht://Dig *does* highlight the words in=20
the excerpt if they are there.  If your excerpt is too small to=20
contain the search terms, you can increase the  max_head_length =20
attribute.  In the standard  rundig  it is 10,000 bytes.  If your =20
max_head_length  is longer than your document length and you are=20
still not getting the words highlighted, let us know.

Out of interest, how did you overcome your earlier problem of ht://Dig=20
not finding the documents at all?

Cheers,
Lachlan

On Tuesday 11 February 2003 02:12, Abbie Greene wrote:
> When I run .rundig I have a large set of .txt files as well as
> .pdfs to search through. I've done all of the installation process
> for converting pdfs to text...however when I .rundig I receive the
> error message:
>
> Deleted, no excerpt: [name of pdf] for what seems like all of my
> pdf files.  Any ideas what I can do to fix this?
>
> Also, is it possible to have the words within the document
> highlighted for easier use of finding them?