From: Leila G. <lm...@ag...> - 2017-03-23 15:58:15
|
Thank you so much Demian! Leila *From:* sol...@go... [mailto: sol...@go...] *On Behalf Of *Demian Katz *Sent:* Thursday, March 23, 2017 8:57 AM *To:* sol...@go...; vuf...@li... *Subject:* Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Leila, You might find it helpful to look at how I have refactored the VuFindIndexer code for VuFind 4.0 -- instead of extending the SolrIndexer, I just created a bunch of stand-alone classes which access an instance of the indexer through a singleton pattern as needed. You should be able to use any of the classes here as a starting point example: https://github.com/vufind-org/vufind/tree/master/import/index_java/src/org/vufind/index If you're not sure how to do any particular thing, let me know and I can point you to a more specific example. I suspect that comparing my version of your geo code to the beanshell version should be pretty enlightening. - Demian ------------------------------ *From:* sol...@go... <sol...@go...> on behalf of Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 11:06 AM *To:* sol...@go...; vuf...@li... *Subject:* RE: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors So I could basically copy the methods out of the VuFindIndexer.java file, rename them and in a new .java file (say, “Coordinate.java” ), and put that file in vufind/import/index_java/src/org/solrmarc/index/ directory. As long as the Coordinate.java file extends the SolrIndexer, I should be good to go, correct? Thanks for the help. I really appreciate it. Leila *From:* sol...@go... [mailto: sol...@go...] *On Behalf Of *Demian Katz *Sent:* Thursday, March 23, 2017 8:01 AM *To:* Leila Gonzales; vuf...@li... *Cc:* sol...@go... *Subject:* [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Actually, you use syntax like you would for built-in custom methods: custom, method You don't have to specify which class it is in as long as each method name is completely unique. - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 10:57 AM *To:* Demian Katz; vuf...@li... *Cc:* sol...@go... *Subject:* RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi Demian, To call the java code from the marc_local.properties, would I use the same syntax (script(NAME.java),method), and would I put the .java code in the index_scripts directory? Thanks! Leila *From:* Demian Katz [mailto:dem...@vi...] *Sent:* Thursday, March 23, 2017 7:44 AM *To:* Leila Gonzales; vuf...@li... *Cc:* sol...@go... *Subject:* Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors It may pay off to convert the BeanShell into pure Java to take advantage of the new SolrMarc. Let me know if I can be of any help with this process. There are some notes here: https://vufind.org/wiki/indexing:solrmarc:custom_java_best_practices <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvufind.org%2Fwiki%2Findexing%3Asolrmarc%3Acustom_java_best_practices&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cb775f52a3ceb4eca728308d471fd1702%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258779207787535&sdata=W9NnZuPlLZPhUnPbA5WB0iVzlQQdNoppvPAdG2IYns0%3D&reserved=0> As always, I'm happy to elaborate as needed. - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 10:31 AM *To:* Demian Katz; vuf...@li... *Cc:* sol...@go... *Subject:* RE: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Thank you very much Uwe and Demian. I did some more troubleshooting this morning and turned off indexing of the geographic search/display fields, re-ran the import, and Solr indexed everything fine in a few seconds. I’m going to see if the issue is with the geographic indexing code or if it’s with one of my records in the batch that’s causing the issue. I am suspecting (and hoping) that it’s the latter. Cheers, Leila *From:* Demian Katz [mailto:dem...@vi...] *Sent:* Thursday, March 23, 2017 5:07 AM *To:* Leila Gonzales; vuf...@li... *Cc:* sol...@go... *Subject:* Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Leila, I'm copying this to solrmarc-tech in case Bob has anything to add. The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference. It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here: https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsolrmarc%2Fsolrmarc%2Fwiki%2FOther-command-line-options&data=02%7C01%7Cdemian.katz%40villanova.edu%7C68b41940fc0f45bfd1d708d471f96970%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258763410416248&sdata=%2FYjDemjabxtNCuHgD7oJR3uEMIT%2FVhwzgDSd1l8A2qo%3D&reserved=0> You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g. EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ... (Note that the values in that example aren't a suggestion, just an example). I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). Good luck! Let us know if you need more help! - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> *Sent:* Thursday, March 23, 2017 12:09 AM *To:* vuf...@li... *Subject:* [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi all, I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3. >From what I can tell, it seems that the import-marc.sh script is dying on the commit stage. Now Importing /incoming/processed/ importrecords.mrc ... Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc 0 [main] DEBUG org.solrmarc.driver.ConfigDriver - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc 3 [main] DEBUG org.solrmarc.tools.PropertyUtils - Opening file: /usr/local/vufind/local/import/import.properties 9 [main] INFO org.solrmarc.driver.ConfigDriver - Effective Command Line is: 10 [main] INFO org.solrmarc.driver.ConfigDriver - java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc] INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:421) - Failed on single doc with id : 143179 ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error. The errors I am getting in the SolrAdmin UI logs are: org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error. ... Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ... Caused by: java.lang.OutOfMemoryError: Java heap space I've tried modifying the following files, but nothing has worked so far: solr.sh: set SOLR_HEAP to 2G import-marc.sh: set INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0' (My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. ) And my current settings are: Using Solr root directory: /usr/local/vufind/solr/vendor Using Java: /usr/lib/jvm/default-java/bin/java java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) Backing up /usr/local/vufind/solr/vufind/logs/solr.log Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log Starting Solr using the following settings: JAVA = /usr/lib/jvm/default-java/bin/java SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server SOLR_HOME = /usr/local/vufind/solr/vufind SOLR_HOST = SOLR_PORT = 8080 STOP_PORT = 7080 JAVA_MEM_OPTS = -Xms2G -Xmx2G GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log SOLR_TIMEZONE = UTC SOLR_OPTS = -Xss256k SOLR_ADDL_ARGS = -Dsolr.log=/usr/local/vufind/solr/vufind/logs Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr.in.sh&data=02%7C01%7Cdemian.katz%40villanova.edu%7C3e512e60b0aa4e41fd5c08d471a605ac%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258405253037300&sdata=eRHVAJK7VkC%2BxO%2B9LIs9jayrQ80aCHKHyHw8J1Q6jIE%3D&reserved=0>? Is there somewhere else I should be looking? Thanks for any guidance you can send my way. Kind regards, Leila -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fgroup%2Fsolrmarc-tech&data=02%7C01%7Cdemian.katz%40villanova.edu%7Ce35d80f0592844f050f108d471fe37d3%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258784077362249&sdata=snRmF11HI4n3IIa3Y4jH%2FgkXJKZlKJws6gkpf6cTOwA%3D&reserved=0> . For more options, visit https://groups.google.com/d/optout <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Cdemian.katz%40villanova.edu%7Ce35d80f0592844f050f108d471fe37d3%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258784077362249&sdata=PCGJfcZqJl1yeGf0WD08ky5A9fAkDxVSiO%2BbU41LjDw%3D&reserved=0> . -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fgroup%2Fsolrmarc-tech&data=02%7C01%7Cdemian.katz%40villanova.edu%7Ce35d80f0592844f050f108d471fe37d3%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258784077362249&sdata=snRmF11HI4n3IIa3Y4jH%2FgkXJKZlKJws6gkpf6cTOwA%3D&reserved=0> . For more options, visit https://groups.google.com/d/optout <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Cdemian.katz%40villanova.edu%7Ce35d80f0592844f050f108d471fe37d3%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258784077362249&sdata=PCGJfcZqJl1yeGf0WD08ky5A9fAkDxVSiO%2BbU41LjDw%3D&reserved=0> . -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech. For more options, visit https://groups.google.com/d/optout. |