Hi,
I am trying to run old version of Aperture (Alpha). I have written a program to spawn multiple threads and each thread is capable of running aperture's crawl().

private class SimpleCrawlerHandler implements CrawlerHandler, RDFContainerFactory {

}

I did this to extract content from multiple directories simultaneously, on demand. However, I am seeing following two problems:

1- If I send multiple simultaneous requests, seems there some serialization happens somewhere and the through-put is not linear to number of threads, insteads, if I increase number of requests, the throughput of the crawler decreases significantly.

2. I want to extract the content of the files and then discard the model. However, even after calling clear()  API and forcing Garbage collection, I still see continous increase of Java's memory after each request is processed.

Some of these might have been solved in newer version. But. I do not have the choice to move to the newer version of aperture and I need to fix these issues in the alpha version only,

Any suggestions / directions will be highly appreciated.


Thanks,

-Neo