From: Dan M. <dm...@in...> - 2003-06-23 20:08:31
|
>=20 > Hello Dan, Hello Martin, >=20 >=20 > AFAIK if you don't want to use an external parser, there is only one=20 > possibility left: I do want to use an external parser but I need not xpdf or I need to get xpdf installed minimally. >=20 > Use the internal parser command as described in >=20 > http://www.htdig.org/attrs.html#pdf_parser I'm using 3.2.0, not sure if it has that attribute >=20 > Hence, for this to work you need to install Acrobat Reader=20 > from Adobe. Dis- pite of what OS you use and the latest=20 > security flaws for Acrobat Reader, which are hopefully fixed=20 > for your OS also, the internal pdf_parser command=20 > is very limited. >=20 > htmldoc or doc2html just calls xpdf for the pdf2text Really? I have htmldoc on another server and not a wimper about=20 xpdf anywher on the server except in the ports collection. I think htmldoc only does html to html,postscript,or pdf but not the other way around anyway. > conversion. An additional disadvantage using Acrobat Reader=20 > is the fact, that you will index Postscript files and you=20 > _have_ to adjust the max_doc_size in you htdig.conf above the=20 > size of your biggest PDF/PostScript file. >=20 > BUT: > Installing some X11 library stuff does not mean to run the=20 > service :)). If you use some kind of Linux with rpm, try=20 Actually I'm trying via rpm but it keeps failing. I'll try to do xpdf with --nodeps like you suggested. I'm on RedHat 8.0 if that helps. Thanks for the reply! Dan > installing only the libraries with --nodeps. Using Solaris=20 > and depending on which Solaris (7/8/9) you use, you don't=20 > need to satisfy all dependancies. Ask me again for a list of=20 > packages :). >=20 > Yours, >=20 > Martin >=20 > --=20 >=20 > -------------------------------------------------------- > arago AG, Institut fuer komplexes Datenmanagement > Am Niddatal 3, 60488 Frankfurt/Main, al...@ar... > Tel. 069/405680, Fax 069/40568111, http://www.arago.de > -------------------------------------------------------- >=20 > =09 > On Mon, Jun 23, 2003 at 02:06:03PM -0500, Dan Muey wrote: > > Hello list, > >=20 > > I'd like to parse and index pdf files but when I try to=20 > install xpdf=20 > > it wants/needs to install a bunch of x windows stuff which I don't=20 > > want to do but even if I try to it keeps failing. > >=20 > > So what I'd like to ask is this: > >=20 > > Has anyone successfully used something else beside xpdf,=20 > like htmldoc=20 > > for instance, to be able to index pdf files? > >=20 > > If so any pointers/documentation would be very helpful. > >=20 > > TIA > >=20 > > Dan > >=20 > >=20 > > ------------------------------------------------------- > > This SF.Net email is sponsored by: INetU > > Attention Web Developers & Consultants: Become An INetU Hosting=20 > > Partner. Refer Dedicated Servers. We Manage Them. You Get=20 > 10% Monthly=20 > > Commission! INetU Dedicated Managed Hosting=20 > > http://www.inetu.net/partner/index.php > > _______________________________________________ > > htdig-general mailing list <htd...@li...> > > To unsubscribe, send a message to=20 > <htd...@li...> with a subject=20 > of unsubscribe > > FAQ: http://htdig.sourceforge.net/FAQ.html >=20 |