GSearch: optimize options for Lucene indexing, proposed by FIZ Karlsruhe
(Matthias Razum and Michael Hoppe) for the eSciDoc project:
While testing our project we discovered another bottleneck of the fedoragsearch:
We have a test where we create and index a lot of fedora-objects (about 1500) in a loop. After creation of each object, we index it right away, calling the fedoragsearch, then we create the next item. The more items we created the slower the system became. We discussed that and thought, this might happen because of the optimization of the index every time we index a single object.
So what we did was that we changed the code in the OperationsImpl-class so the index only gets optimized each 1000ths object. Running the test again, we saw that the system became much faster.
Maybe the fedoragsearch should have a configuration so we could choose if the index has to get optimized after each indexing of a single item. Additionally to this there could be a call 'optimizeIndex' to the fedoragsearch.