Re: [htdig] Deleted, no excerpt with pdf files

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Deleted, no excerpt with pdf filesTry running doc2html.pl from the =
command line:

    /opt/www/htdig/bin/doc2html.pl filename.pdf application/pdf

where filename.pdf is the full path name of a PDF document.

--
David Adams
Computing Services
Southampton University

  ----- Original Message -----=20
  From: Steve Marshall=20
  To: htd...@li...=20
  Sent: Monday, March 04, 2002 10:08 AM
  Subject: [htdig] Deleted, no excerpt with pdf files

  //htDig is working fine for us with a large intranet 2Gig or so which =
is entirely graphics & .html. I want to index pdfs too of course.

  I am running the doc2html.pl script on a very simple (test) index.html =
file which links only to a .GIF and small .pdf file.( I have tried =
parse_doc & conv_doc too)

  I have the latest XPDF, and pdftotext works fine on the same .pdf at =
the command line and produces a perfect .txt file=20

  When I run htDig with the -vvvvv option it lists all the lines in that =
.pdf file as plain text so it is apparently parsing properly.=20

  However when I try to htmerge I get a "Deleted, no exerpt" message. =
The wordlist file is tiny.=20

  I can see from an earlier response that the problem might be that the =
parser hasn't emitted a usable "h" record - how would I go about fixing =
that? Would this apply to a .txt file - the test output hasn't got any =
tags (of course).

  This is the only relevant uncommented line in htdig.conf=20

  external parsers        application/pdf->text/html =
/opt/www/htdig/bin/doc2html.pl=20

  Any help gratefully appreciated=20

  Steve Marshall=20

  =
________________________________________________________________________
  This e-mail has been scanned for all viruses by Star Internet. The
  service is powered by MessageLabs. For more information on a proactive
  anti-virus service working around the clock, around the globe, visit:
  http://www.star.net.uk
  =
________________________________________________________________________