From: David A. <D.J...@so...> - 2003-01-02 10:00:52
|
Try: external_parsers: \ application/pdf->text/html /usr/local/bin/doc2html.pl I have to agree with you that this is not easy. Only it isn't easy to make it easy either! -- David Adams Information Systems Services Southampton University ----- Original Message ----- From: "Michael Friendly" <friendly@YorkU.CA> To: <htd...@li...> Cc: "Geoff Hutchison" <ghu...@ws...> Sent: Saturday, December 28, 2002 5:01 PM Subject: Re: [htdig] can't index PDF files > OK, so I re-read all the FAQ sections, configured doc2html and pdf2html, > and used them as external parsers, with > > external_parsers: \ > application/pdf /usr/local/bin/doc2html > > now, I get hundreds of error messages, > > External parser error: unknown field in line <HTML> > URL: .... vcdstory.pdf > External parser error: unknown field in line <HEAD> > URL: .... vcdstory.pdf > .... > > > > It's not clear to me why this should be so hard. > > Geoff Hutchison wrote: > > >On Thu, 26 Dec 2002, Michael Friendly wrote: > > > > > > > >>I've read the FAQ on this topic, but still can't get rundig to index pdf > >>files. I have set > >> > >>max_doc_size: 500000 > >> > >>pdf_parser: /usr/bin/htdig-pdfparser > >>debian_pdf_parser: xpdf > >> > >>and verified that pdftotext works from the command line on my debian > >> > >> > > > >No, I don't think this is what you want to do. The pdf_parser attribute is > >now quite depreciated--it really, truly expects Acrobat-generated PS > >files. > > > >I'd look at the FAQ again (specifically q4.9): > >http://www.htdig.org/FAQ.html#q4.9 > > > >-- > >-Geoff Hutchison > >Williams Students Online > >http://wso.williams.edu/ > > > > > > > > -- > Michael Friendly Email: fri...@yo... > Professor, Psychology Dept. > York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 > 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html > Toronto, ONT M3J 1P3 CANADA > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-general mailing list <htd...@li...> > To unsubscribe, send a message to <htd...@li...> with a subject of unsubscribe > FAQ: http://htdig.sourceforge.net/FAQ.html > |