From: Leila G. <lm...@ag...> - 2017-03-23 14:58:01
|
Hi Demian, To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory? Thanks! Leila *From:* Demian Katz [mailto:dem...@vi...] *Sent:* Thursday, March 23, 2017 7:44 AM *To:* Leila Gonzales; vuf...@li... *Cc:* sol...@go... *Subject:* Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here: https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices As always, I'm happy to elaborate as needed. - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 10:31 AM *To:* Demian Katz; vuf...@li... *Cc:* sol...@go... *Subject:* RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter. Cheers, Leila *From:* Demian Katz [mailto:dem...@vi...] *Sent:* Thursday, March 23, 2017 5:07 AM *To:* Leila Gonzales; vuf...@li... *Cc:* sol...@go... *Subject:* Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Leila, I'm copying this to solrmarc-tech in case Bob has anything to add. The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference. It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here: https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsolrmarc%2Fsolrmarc%2Fwiki%2FOther-command-line-options&data=02%7C01%7Cdemian.katz%40villanova.edu%7C68b41940fc0f45bfd1d708d471f96970%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258763410416248&sdata=%2FYjDemjabxtNCuHgD7oJR3uEMIT%2FVhwzgDSd1l8A2qo%3D&reserved=0> You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g. EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ... (Note that the values in that example aren't a suggestion, just an example). I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). Good luck! Let us know if you need more help! - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 12:09 AM *To:* vuf...@li... *Subject:* [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi all, I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3. >From what I can tell, it seems that the import-marc.sh script is dying on the commit stage. Now Importing /incoming/processed/ importrecords.mrc ... Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc 0 [main] DEBUG org.solrmarc.driver.ConfigDriver - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc 3 [main] DEBUG org.solrmarc.tools.PropertyUtils - Opening file: /usr/local/vufind/local/import/import.properties 9 [main] INFO org.solrmarc.driver.ConfigDriver - Effective Command Line is: 10 [main] INFO org.solrmarc.driver.ConfigDriver - java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc] INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:421) - Failed on single doc with id : 143179 ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error. The errors I am getting in the SolrAdmin UI logs are: org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error. ... Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ... Caused by: java.lang.OutOfMemoryError: Java heap space I've tried modifying the following files, but nothing has worked so far: solr.sh: set SOLR_HEAP to 2G import-marc.sh: set INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0' (My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. ) And my current settings are: Using Solr root directory: /usr/local/vufind/solr/vendor Using Java: /usr/lib/jvm/default-java/bin/java java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) Backing up /usr/local/vufind/solr/vufind/logs/solr.log Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log Starting Solr using the following settings: JAVA = /usr/lib/jvm/default-java/bin/java SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server SOLR_HOME = /usr/local/vufind/solr/vufind SOLR_HOST = SOLR_PORT = 8080 STOP_PORT = 7080 JAVA_MEM_OPTS = -Xms2G -Xmx2G GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log SOLR_TIMEZONE = UTC SOLR_OPTS = -Xss256k SOLR_ADDL_ARGS = -Dsolr.log=/usr/local/vufind/solr/vufind/logs Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr.in.sh&data=02%7C01%7Cdemian.katz%40villanova.edu%7C3e512e60b0aa4e41fd5c08d471a605ac%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258405253037300&sdata=eRHVAJK7VkC%2BxO%2B9LIs9jayrQ80aCHKHyHw8J1Q6jIE%3D&reserved=0>? Is there somewhere else I should be looking? Thanks for any guidance you can send my way. Kind regards, Leila |