From: Antoni M. <ant...@gm...> - 2009-08-31 14:25:35
|
Sean O'Connor pisze: > Antoni, > Great reply; you nailed the answer. > > I seem to have had an outdated email address in my sourceforge account, > so I did not actually get your reply (just found it in the read-only > online version). I can't figure out how to reply to your reply without > actually having the message to reply to, so I'm sending this 'new' > message, which I expect will break the threading. Anyway, I believe I've > fixed my email problem, so the maillist difficulties shouldn't happen > again :-). > > I had searched around for a couple of hours, but somehow missed the link > you point out below. I would be happy to help with the wiki additions if > you like. Feel free to delegate any or all of what you were going to do. > > I've now hit a new snag, an OutOfMemoryError: Java heap space. I will > start a new thread about that if I cannot resolve it myself. So far, > I've changed from a very diverse content directory where I had the OOME > at about 1,300 files, to crawling a large source folder. I am currently > at 14,000+ files, so I think one of the crawled files is the culprit (my > logging messages point to a GIS file and string problems, which makes a > bit of sense). > Thanks again, > > Sean Sean, In general, if you store both 'normal' data and AccessData in a NativeStore, then the most likely source of OutOfMemoryErrors are the files themselves. We did our best to ensure that Aperture keeps only one file in memory at a single moment. This can already be quite a lot. To be really on the safe side I would advise setting the heap size to four times the size of the largest file you want to index and see if the problems go away. Archives are done in a streaming fashion, so if you have any 1GB zip or tar.gz archives, use the largest file inside the zip as a reference. This means that Aperture usually breaks on big plaintext files like 100MB logs, unless your max heap size is big enough. If you have any of these you could exclude them from crawl using the setMaximumSize or setDomainBoundaries method of the FileSystemDataSource. Also there is a special case for MS Office documents. By default we support only documents smaller than 4MB. If you need larger ones, set the system property aperture.poiUtil.bufferSize to something larger like: System.setProperty("aperture.poiUtil.bufferSize", "16777216"); All kinds of comments welcome Antoni Mylka ant...@gm... |