From: David A. <D.J...@so...> - 2005-01-19 10:46:05
|
How are you using htdig to index .ppt files? Recent versions of doc2html= .pl=20 have a default input limit of 20Mbytes and will not try to convert files = any=20 larger. Just increase the limit in the doc2html.pl script. I have found that ppthtml 0.4 from www.xlhtml.org (now relocated to=20 http://chicago.sourceforge.net/xlhtml), which is what I use, does not alw= ays=20 succeed in extracting text after the first embedded image. I have not found problems with ppthtml on RedHat Linux, but on Solaris th= e=20 process size could be very large. With >20Mbytes .ppt files I doubt if i= t=20 would run. David Adams Corporate Information Services Information Systems Services University of Southampton ----- Original Message -----=20 From: "F. Spitzer, GEOSYSTEMS" <f.s...@ge...> To: <htd...@li...> Sent: Wednesday, January 19, 2005 6:48 AM Subject: [htdig] Indexing large Powerpoints > Good morning List! > > I have one problem to solve. Maybe you can help me? > > We have a huge (more than 250) Powerpoint collection. So I want htdig t= o=20 > build up an index, allowing the users to search for keywords. > > Things are working so far. Htdig does it=E2=80=99s job quite well. The = only=20 > problem that I still have consists with ppt-files larger than 20 MB.=20 > Unfortunately nearly 50% of the files are larger than 20 MB. > > I set max_doc_size to 80000000 (80MB, this is the largest ppt). But=20 > running htdig will produce the following output: Input file size of=20 > 45956608 at or above 20000000 limit. > For me it seems, that there is an other limitation of htdig, that ignor= es=20 > the value set by max_doc_size. > > How can I overcome this limitation? > > I though about writing a shell script that does the conversion of ppt t= o=20 > html before running htdig. Htdig will than use the html files for build= ing=20 > up the index. Using url_part_aliases during db creation and during the=20 > search will replace the html-doc location to the original ppt location. > > Has anybody did this before? Ore even better is there an other solution= =20 > for my problem. > > Thanks a lot for you help. Any hints are welcome. > > Cheers Fritz > > Fritz Spitzer > Schulungsleitung und Systemintegration > > -------------------------------------------------------------------- > GEOSYSTEMS GmbH > Riesstra=C3=9Fe 10, D-82110 Germering, GERMANY > www.geosystems.de > > E: f.s...@ge... > T: +49-(0)89-89 43 43 -0 (Ext. -20) > F: +49-(0)89-89 43 43 99 > > + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + > Abonnieren Sie unseren Newsletter, um immer auf dem Laufenden zu sein: > www.geosystems.de/newsletter > > + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + > > > > > > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > ht://Dig general mailing list: <htd...@li...> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > >=20 |