From: Leila G. <lm...@ag...> - 2017-03-23 18:19:32
|
Thanks Bob. I’ve done some more troubleshooting and it looks like the issue is probably due to the indexing of spatial coordinates. When I don’t index the geo fields, the import runs just fine. It appears there were some changes between Solr 4.2 and 5.5, and I may have to update the coordinate indexing code for VuFind. I’m looking into that some more and am hunting down the records with the coordinates that are probably causing the issue. I did find this thread which may prove useful: http://lucene.472066.n3.nabble.com/Spatial-Dataimport-full-import-results-in-OutOfMemory-for-a-rectangle-defining-a-line-tp4034928p4035372.html This is the first error I see in the solr.log prior to the “org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error.” errors is : 2017-03-23 16:25:28.380 INFO (qtp2082400824-13) [ x:biblio] o.a.s.u.p.LogUpdateProcessorFactory [biblio] webapp=/solr path=/update params={wt=javabin&version=2}{add=[73231 (1562677701975736320), 73233 (1562677701979930624), 73234 (1562677701986222080), 73322 (1562677702001950720), 73323 (1562677702007193600), 73377 (1562677702014533632), 73380 (1562677702022922241), 73381 (1562677702025019392), 73382 (1562677702028165120), 73497 (1562677702030262272), ... (97 adds)]} 0 739185 2017-03-23 16:25:30.197 ERROR (qtp2082400824-13) [ x:biblio] o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.HttpSolrCall.sendError(HttpSolrCall.java:604) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree.stringToBytesPlus1(GeohashPrefixTree.java:92) at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree.access$000(GeohashPrefixTree.java:37) at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree$GhCell.<init>(GeohashPrefixTree.java:104) at org.apache.lucene.spatial.prefix.tree.GeohashPrefixTree$GhCell.getSubCells(GeohashPrefixTree.java:131) at org.apache.lucene.spatial.prefix.tree.LegacyCell.getNextLevelCells(LegacyCell.java:141) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:150) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.recursiveTraverseAndPrune(RecursivePrefixTreeStrategy.java:153) at org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy.createCellIteratorToIndex(RecursivePrefixTreeStrategy.java:128) at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:151) at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:146) at org.apache.lucene.spatial.prefix.PrefixTreeStrategy.createIndexableFields(PrefixTreeStrategy.java:137) at org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:211) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:122) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:82) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:280) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) - Leila *From:* Robert Haschart [mailto:rh...@vi...] *Sent:* Thursday, March 23, 2017 11:06 AM *To:* sol...@go...; Leila Gonzales; vuf...@li... *Subject:* Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Weighing in on this part of Demian's message. I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). The number that is sent in a chunk is 640 records. You can set the system property that controls this value on the command line thusly: -Dsolrmarc.indexer.chunksize=100 that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to. -Bob Haschart On 3/23/2017 8:07 AM, Demian Katz wrote: Leila, I'm copying this to solrmarc-tech in case Bob has anything to add. The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference. It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here: https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g. EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ... (Note that the values in that example aren't a suggestion, just an example). I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). Good luck! Let us know if you need more help! - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> <lm...@ag...> *Sent:* Thursday, March 23, 2017 12:09 AM *To:* vuf...@li... *Subject:* [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi all, I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3. >From what I can tell, it seems that the import-marc.sh script is dying on the commit stage. Now Importing /incoming/processed/ importrecords.mrc ... Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc 0 [main] DEBUG org.solrmarc.driver.ConfigDriver - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc 3 [main] DEBUG org.solrmarc.tools.PropertyUtils - Opening file: /usr/local/vufind/local/import/import.properties 9 [main] INFO org.solrmarc.driver.ConfigDriver - Effective Command Line is: 10 [main] INFO org.solrmarc.driver.ConfigDriver - java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc] INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:421) - Failed on single doc with id : 143179 ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error. The errors I am getting in the SolrAdmin UI logs are: org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error. ... Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ... Caused by: java.lang.OutOfMemoryError: Java heap space I've tried modifying the following files, but nothing has worked so far: solr.sh: set SOLR_HEAP to 2G import-marc.sh: set INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0' (My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. ) And my current settings are: Using Solr root directory: /usr/local/vufind/solr/vendor Using Java: /usr/lib/jvm/default-java/bin/java java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) Backing up /usr/local/vufind/solr/vufind/logs/solr.log Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log Starting Solr using the following settings: JAVA = /usr/lib/jvm/default-java/bin/java SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server SOLR_HOME = /usr/local/vufind/solr/vufind SOLR_HOST = SOLR_PORT = 8080 STOP_PORT = 7080 JAVA_MEM_OPTS = -Xms2G -Xmx2G GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log SOLR_TIMEZONE = UTC SOLR_OPTS = -Xss256k SOLR_ADDL_ARGS = -Dsolr.log=/usr/local/vufind/solr/vufind/logs Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr.in.sh&data=02%7C01%7Cdemian.katz%40villanova.edu%7C3e512e60b0aa4e41fd5c08d471a605ac%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258405253037300&sdata=eRHVAJK7VkC%2BxO%2B9LIs9jayrQ80aCHKHyHw8J1Q6jIE%3D&reserved=0>? Is there somewhere else I should be looking? Thanks for any guidance you can send my way. Kind regards, Leila -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech. For more options, visit https://groups.google.com/d/optout. |