From: Martin A. <al...@ar...> - 2004-08-31 13:11:02
|
Hi David, On Tue, Aug 31, 2004 at 02:02:35PM +0100, David Adams wrote: > Martin, >=20 > This is a joke, yes? >=20 > If not, please note that the Lucene FAQs make it clear that it is equally > dependant on external parsers. That's what a colleague recommended to me: to look in the lucene FAQ's whether there is any alternative to the already mentioned parsers.. Honestly, I didn't take a look first before writing my email. :( Fact is: indexing some webtree with the mentioned ppthtml, xlhtml or xpdf takes ten times longer with a load of 10 on a dualproc Sun V480 with 4G RAM. Indexing only .doc files and .html rundig completes in about 30mins. I discover hanging ppthtml and xlhtml processes, consuming nearly 95% CPU and consuming about 1GB RAM for each document. Of course, those processes don't come back and have to be killed... :( > We use wp2html to convert Word documents and it's fine,but we bought it o= nly > because we needed to convert Wordperfect documents (not that we get many!) >=20 > David Adams Yours, Martin --=20 -------------------------------------------------------- arago AG, Institut fuer komplexes Datenmanagement Am Niddatal 3, 60488 Frankfurt/Main, al...@ar... Tel. 069/405680, Fax 069/40568111, http://www.arago.de -------------------------------------------------------- |