From: Ted Stresen-R. <bow...@ho...> - 2002-08-28 20:12:20
|
That's perfect. That's exactly what we needed. Thank you! BTW, I'm running rundig.sh and I'm getting the following output: /usr/true is absent or unwilling to execute. What is true? (and that's NOT a philosophical question ;-) Also, any idea why the excerpt field would be empty when getting search results? Ted Stresen-Reuter On 8/23/02 4:29 PM, "Gilles Detillieux" <gr...@sc...> wrote: > According to Ted Stresen-Reuter: >> On a related note, is there any way to customize the TITLE attribute >> htsearch displays for pdfs? We have over 100 MB of pdfs we index every night >> and it would be VERY helpful to be able to provide more accurate titles in >> the search results. > > Well, the best way is to edit the PDF description information, in Acrobat > Exchange, to set the title. That way, the conv_doc.pl or doc2html.pl > script will pick it up automatically, via pdfinfo. > > Failing that, the other option is to put a hook into your Perl script to > read the alternate title for a given URL from a file. Here's how I did > it in conv_doc.pl, for some PDFs of scientific papers... > > --- contrib/conv_doc.pl.orig Thu Jul 12 09:38:29 2001 > +++ contrib/conv_doc.pl Thu Oct 18 12:23:58 2001 > @@ -71,6 +71,7 @@ $CATPDF = "/usr/bin/pdftotext"; > $PDFINFO = "/usr/bin/pdfinfo"; > #$CATPDF = "/usr/local/bin/pdftotext"; > #$PDFINFO = "/usr/local/bin/pdfinfo"; > +$titlelist = "/home/httpd/html/SCRC/manuscripts/titles.lst"; > > ######################################### > # > @@ -183,6 +183,23 @@ if ($ishtml) { > print "<HTML>\n<head>\n"; > > # print out the title, if it's set, and not just a file name, or make one up > +if (-r $titlelist) { > + if (open(INFO, "grep \"$ARGV[2]\" $titlelist 2>$null |")) { > + while (<INFO>) { > + if (/^$ARGV[2]/) { > + s/^$ARGV[2]\s+//; > + s/\s+$//; > + s/\s+/ /g; > + s/&/\&\;/g; > + s/</\<\;/g; > + s/>/\>\;/g; > + $title = $_; > + last; > + } > + } > + close INFO; > + } > +} > if ($title eq "" || $title =~ /^[A-G]:[^\s]+\.[Pp][Dd][Ff]$/) { > @parts = split(/\//, $ARGV[2]); # get the file basename > $parts[-1] =~ s/%([A-F0-9][A-F0-9])/pack("C", hex($1))/gie; > > > Here, for example, is a line from titles.lst: > > http://www.scrc.umanitoba.ca/SCRC/manuscripts/41.pdf Spinal circuitry of > sensorimotor control of locomotion |