From: Arianna A. <ar...@ds...> - 2006-11-29 14:10:18
|
Hi everybody, I'm trying to run htDig and parse pdf and rtf files. Yes, trying..... I've read all the FAQ but I can't get off of these two (related?) problems: 1) if I run rundig as root, while parsing pdf file I get: PDF::parse(http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf) PDF::parse: error running pdf_parser on ^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf size = 78585 But if I run it as normal user I get this different error: PDF::setContents(78585 bytes) PDF::parse(http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf) PDF::parse: cannot open ^^^^^^^^^^^ //usr/share/webapps/htdig/3.1.6-r7/hostroot/htdig/db/htdig14605.pdf size = 78585 WHAT? My htdig.conf says: database_dir: /tmp/db usr/share/webapps/htdig/3.1.6-r7/hostroot/htdig/db was the *original* configured database_dir I commented out. If I run: ./doc2html.pl /var/www/segramm/htdocs/common/docs/contratti/piano_utilizzo.pdf "application/pdf" http://segramm.dico.unimi.it/common/docs/contratti/piano_utilizzo.pdf the file is converted in html without problems..... So, can anybody tell me something about how to solve this? 2) I've installed rtf2html and in doc2html.pl I've set up the full path to che executable. In htdig.conf I've added: external_parser: application/pdf->text/html /usr/local/script/doc2html.pl \ application/rtf->text/html /usr/local/script/doc2html.pl \ text/rtf->text/html /usr/local/script/doc2html.pl \ But when I run rundig I get: 101:137:2:http://segramm.dsi.unimi.it/common/docs/contratti/piano_utilizzo.rtf: Retrieval command for http://segramm.dsi.unimi.it/common/docs/contratti/piano_ut ilizzo.rtf: GET /common/docs/contratti/piano_utilizzo.rtf HTTP/1.0 User-Agent: htdig/3.1.6 (str...@ds...) Referer: http://segramm.dsi.unimi.it/index.php/sid=3;go=contratti Host: segramm.dsi.unimi.it Header line: HTTP/1.1 200 OK Header line: Date: Wed, 29 Nov 2006 13:53:31 GMT Header line: Server: Apache Header line: Last-Modified: Mon, 20 Nov 2006 16:25:16 GMT Converted Mon, 20 Nov 2006 16:25:16 GMT to Mon, 20 Nov 2006 16:25:16 Header line: ETag: "3d3e4-363d-29b29300" Header line: Accept-Ranges: bytes Header line: Content-Length: 13885 Header line: Connection: close Header line: Content-Type: text/rtf Header line: returnStatus = 0 Read 8192 from document Read 5693 from document Read a total of 13885 bytes "text/rtf" not a recognized type. Assuming text ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Why? Could anybody help me? Cheers, Arianna |