From: Martin A. <al...@ar...> - 2004-02-26 09:06:29
|
Hello Tim, Did you take a look at http://www.htdig.org/attrs.html#external_parsers You external parser has to accept the parameters described there, so in fact you have to write a shell wrapper for it. I attached my parser scripts so you can instantly use them. I use for doc2html: wvware (wvware.sourceforge.net) which is really powerful pdf2html: pdftohtml (not xpdf, but a self patched version of pdfhtml with xpdf3 libraries so I can parse PDF 1.5) ppt2html: ppthtml xls2html: xlhtml Just modify the variables in there so you have the proper locations. I do a kind of extensive logging to have separate logfiles for each parser, so I can determine the documents which could not be converted. (Just in case users ask :). Yours, Martin On Wed, Feb 25, 2004 at 05:22:46PM -0500, Tim Cleary wrote: > Thanks for everyone's suggestions on my problem yesterday. > > A new one: > I am running into trouble with external conversion- it is not working. > > Basically I have 3 types of files I want to convert - MS Excel, MS > Powerpoint, and PDF. I have installed a utility for each in /usr/local/bin: > xlhtml for excel, ppthtml for powerpoint, pdftohtml for pdf. Each generates > standard output to the screen just fine when called from the command line, > and when output is directed to a file, it is created as "text/html" so I > thought that it would work to have them tagged as external converters via > htdig.conf. The htdig.conf file is as follows: > .... > external_parsers: application/vnd.ms-excel->text/html > /usr/local/bin/xlhtml \ > application/vnd.ms-powerpoint->text/html /usr/local/bin/ppthtml \ > # application/pdf->text/html "/usr/local/bin/pdftohtml > -noframes -I -stdout" > ... > > On htdig run through rundig, I get a header-line input that says > "content-type: application/vnd.ms-powerpoint, not HTML" and then it moves > onto the next item. It doesn't even work for the pdf. > > Then for each file I get a "deleted, no excerpt" when it goes to merge. > > I feel like I am following the formatting correctly. I have tried different > versions of the application type (msword, ms.word, doc, etc.). I am running > OS X so these were the specific application types it listed (using file -i). > > Thanks for any suggestions. > > Tim Cleary > > -- > Tim Cleary > Manager > Dean & Company > (703) 760-4375 > cl...@de... > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general -- -------------------------------------------------------- arago AG, Institut fuer komplexes Datenmanagement Am Niddatal 3, 60488 Frankfurt/Main, al...@ar... Tel. 069/405680, Fax 069/40568111, http://www.arago.de -------------------------------------------------------- |