Re: [htdig] PDF Problems

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

According to Ricky Greer:
> I'm using htdig 3.1.6 along with acroread5 on a Sun Solaris box. When 
> building the databases, everything appears to be running normally and it 
> completes with no errors, but pdf's do not show up in the search results. 
> Anyone have a clue why?

Well, having never tried acroread version 5 with htdig, I can only guess
and offer other tips.  It may be that version 5 doesn't support the
-toPostScript command line option, or that the PostScript it generates
is different enough from earlier versions that the parser in htdig can't
extract the text from it.  Try manually running the contrib/acroconv.pl
script from your htdig 3.1.6 source directory against one of your PDFs to
see if it pulls out anything meaningful.  If it doesn't, or if you'd like
to try an alternative approach, see http://www.htdig.org/FAQ.html#q4.9

If acroconv.pl does work, then the internal PDF PostScript parser should
too, as long as max_doc_size is big enough for your largest PDF file.
(See http://www.htdig.org/FAQ.html#q5.2)

-- 
Gilles R. Detillieux              E-mail: <gr...@sc...>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)