From: htdig <ht...@ac...> - 2002-08-31 09:24:53
|
I have following in htdig.conf: external_parsers: application/rtf->text/html /usr/local/bin/doc2html.pl \ text/rtf->text/html /usr/local/bin/doc2html.pl \ application/pdf->text/html /usr/local/bin/doc2html.pl \ application/postscript->text/html /usr/local/bin/doc2html.pl in /usr/local/bin following files: -rw-r--r-- 1 root root 2207 aug 30 00:46 acroconv.pl -rw-r--r-- 1 root root 17000 aug 29 11:55 doc2html.pl -rw-r--r-- 1 root root 2368 aug 30 00:48 parsepdf.pl -rw-r--r-- 1 root root 4083 aug 29 11:44 pdf2html.pl -rw-r--r-- 1 root root 1324 aug 29 11:45 swf2html.pl in doc2html following change: # PDF to HTML conversion script # Full pathname of Perl script pdf2html.pl my $PDF2HTML = '/usr/local/bin'; and following section ( of which I don't understand much): # Adobe PDF file using Perl script if ($PDF2HTML) { $mime_type = "application/pdf"; $cmd = $PDF2HTML; # Replace default title (if used) with filename: $cmdl = "$cmd $Input $mime_type $name"; $magic = '%PDF-|\0PDF CARO\001\000\377'; &store_html_method('PDF (pdf2html)',$cmd,$cmdl,$mime_type,$magic); } in pdf2html.pl: #### YOU MUST SET THESE #### my $PDFTOTEXT = "/usr//bin/pdftotext"; my $PDFINFO = "/usr/bin/pdfinfo"; # and in /usr/bin following files: [root@WebSrv bin]# ls /usr/bin/pd* /usr/bin/pdf2dsc /usr/bin/pdfimages /usr/bin/pdftopbm /usr/bin/pdftotext /usr/bin/pdf2ps /usr/bin/pdfinfo /usr/bin/pdftops /usr/bin/pdiff when I run rundig some of outputlines shows: 28:138:1:http://www.acnord.dk/pdf/?N=D: *****-------- size = 1486 30:139:1:http://www.acnord.dk/pdf/?M=A: *+***-------- size = 1486 31:140:1:http://www.acnord.dk/pdf/?S=A: **+**-------- size = 1486 - that is the directory /pdf/ containes some of the pdf-files, but their names don't show up. when I run htdig -vv some lines shovs: 344:417:1:http://www.acnord.dk/pdf/?M=A: (changed) title: Index of /pdf ***** url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser.pdf url rejected: (level 1)http://www.acnord.dk/pdf/ugekurser0203.pdf url rejected: (level 1)http://www.acnord.dk/pdf/vovkatalog.pdf url rejected: (level 1)http://www.acnord.dk/pdf/op10-lo.mp3 url rejected: (level 1)http://www.acnord.dk/pdf/op10.mp3 url rejected: (level 1)http://www.acnord.dk/pdf/SFO-IT.pdf url rejected: (level 1)http://www.acnord.dk/pdf/AVG.pdf url rejected: (level 1)http://www.acnord.dk/pdf/samlinger.pdf size = 1486 I don't fig. out why they are rejected (not in badext-list) -- one thing concerns me is that my server RH7 runs in textmode only. Do i have to startx in order to have xpdf work? yours finn |