From: <MV@PDV-SYSTEME.DE> - 2003-04-28 11:31:55
|
Phil Smalley <ps...@bl...> wrote: > Thanks Martin, but I don't think so....... >=20 > max_doc_size: 10000000 >=20 > The document size is actually 21504 (reported correctly in "db.docs" = but > nothing in H: (This is where the words for searching should be listed = - as > they are for a .PDF on the search). >=20 > /usr/sbin/doc2html.pl `pwd`/filename.doc application/msword url > z >=20 > produces no errors/warnings and "z" contains a readable HTML version = of the > document (albeit a bit duplicated). >=20 > Incidnetally, the "UNABLE to convert" was produced at the command-line = until > the addition of "application/msword url". >=20 > htdig -vv produces :- >=20 > pushing http://servername.domain.gov.uk/directory/filename.doc >=20 > 1:1:1:http://servername.domain.gov.uk/directory/filename.doc:=20 > size =3D 21504 >=20 > The entry for a PDF document reads:- >=20 > pushing http://servername.domain.gov.uk/directory/filename.pdf >=20 > 2:2:1http://servername.domain.gov.uk/directory/filename.pdf: > title: (1).PDF > size 15467 Another -v and the complete log entries would certainly help. =20 > htdig.conf =3D external-parsers: \ Replace that "-" by a "_", please. Judging from the fact that PDF parsing works, I guess you didn't copy'n'paste here?! > application/msword->text/html /usr/sbin/doc2html.pl \ > application/postscript->text/html /usr/sbin/conv_doc.pl \ > application/pdf->text/html /usr/sbin/conv_doc.pl >=20 > Finally, I have tried changing doc2html.pl to conv_doc.pl but this = produces > EXACTLY the same results. >=20 > Any other ideas? It could be related to the fact that htdig knows internally about PDF = (from pdf_parser times), while it identifies MS-Word only by the MIME type - = which it only gets from a (correctly configured) webserver. What MIME type does your webserver return for MS-Word files? cu, Martin --=20 Emacs would be a great | Martin Vorlaender | VMS & WNT programmer operating system, | work: mv...@pd... if only it came with | http://www.pdv-systeme.de/users/martinv/ a decent editor... | home: ma...@ra... |