From: David A. <D.J...@so...> - 2003-04-28 13:54:26
|
Most likely the problem is in your configuration file. Replace external-parsers: by external_parsers: Check that there are no spaces after / at the end of lines. Check and check again your external_parser: statement. Take a look at FAQ 4.9 David Adams Corporate Information Services Information Systems Services University of Southampton ----- Original Message ----- From: "Phil Smalley" <ps...@bl...> To: <htd...@li...> Sent: Monday, April 28, 2003 12:32 PM Subject: [htdig] External Parsers (Word Docs) Problem > > Thanks Martin, but I don't think so....... > > max_doc_size: 10000000 > > The document size is actually 21504 (reported correctly in "db.docs" b= ut > nothing in H: (This is where the words for searching should be listed = - as > they are for a .PDF on the search). > > /usr/sbin/doc2html.pl `pwd`filename.doc application/msword url > z > > produces no errors/warnings and "z" contains a readable HTML version o= f the > document (albeit a bit duplicated). > > Incidnetally, the "UNABLE to convert" was produced at the command-line > until > the addition of "application/msword url". > > htdig -vv produces :- > > pushing http://servername.domain.gov.uk/directory/filename.doc > > 1:1:1:http://servername.domain.gov.uk/directory/filename.doc: size =3D 21504 > > The entry for a PDF document reads:- > > pushing http://servername.domain.gov.uk/directory/filename.pdf > > 2:2:1http://servername.domain.gov.uk/directory/filename.pdf: > title: (1).PDF > size 15467 > > htdig.conf =3D external-parsers: application/msword->text/html > /usr/sbin/doc2html.pl \ > > application/postscript->text/html /usr/sbin/conv_doc.pl \ > application/pdf->text/htm= l > /usr/sbin/conv_doc.pl > > Finally, I have tried changing doc2html.pl to conv_doc.pl but this produces > EXACTLY the same results. > > Any other ideas? > > Regards > Phil. > > ----- Original Message ----- > From: "Vorl=E4nder, Martin" <MV@PDV-SYSTEME.DE> > To: <htd...@li...> > Cc: "Phil Smalley" <ps...@bl...> > Sent: Friday, April 25, 2003 10:59 AM > Subject: RE: [htdig] External Parsers (Word Docs) Problems > > > Phil Smalley wrote: > > Using v3.1.6 under RedHat 7.2. > > > > Can successfully parse .pdf files using conv_doc.pl but doc.db conta= ins > > only the file size for MS Word docs. > > > > Have tried both conv_doc.pl & doc2html.pl. Both seem to work ok whe= n > > executed manually at the command-line. > > > > Any suggestions appreciated. > > Terse questions beget terse answers... > > This wouldn't by chance be a max_doc_size problem, or would it? > > See: > http://www.htdig.org/attrs.html#max_doc_size > http://www.htdig.org/FAQ.html#q4.8 > http://www.htdig.org/FAQ.html#q4.9 > http://www.htdig.org/FAQ.html#q5.2 > http://www.htdig.org/FAQ.html#q5.37 > > cu, > Martin > -- > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by:ThinkGeek > Welcome to geek heaven. > http://thinkgeek.com/sf > _______________________________________________ > htdig-general mailing list <htd...@li...> > To unsubscribe, send a message to <htd...@li...> with a subject of unsubscri= be > FAQ: http://htdig.sourceforge.net/FAQ.html > > |