From: Leila G. <lm...@ag...> - 2017-03-26 02:31:50
|
Thank you Demian and Tod. I was able to finally figure out the issue, and I just want to document this in case anyone else runs into the issue before I submit the forthcoming PR. The problem was that our files had coordinate pairs for which one was on the South pole (S900000) and the other coordinate was within 5 minutes of the South pole (i.e. S895900). (For some reason, there is no issue with the North pole.) Another issue that failed with the Solr indexing was where we had E001 and W000 in the west and east coordinates. From what I can tell from http://lucene.472066.n3.nabble.com/Spatial-Dataimport-full-import-results-in-OutOfMemory-for-a-rectangle-defining-a-line-td4034928.html#a4035372, the problem seems to be that Solr runs out of memory trying to create too many spatial grids – or something to that effect. I’ll be submitting a pull request to trap for these cases in the getAllCoordinates routine (in location.bsh and VuFindIndexer.java) so that the coordinates don't get processed for these cases, and an error message is produced during indexing so that the user can fix the records. I also found some other minor bugs in the validateCoordinate routine (Solr supports longitudinal wrapping so we don't have to check for West > East anymore), and also in the map_tab_ol.js - coordinates that crossed the dateline crossing were not being displayed properly. Cheers, Leila *From:* Tod Olson [mailto:to...@uc...] *Sent:* Saturday, March 25, 2017 10:43 AM *To:* Demian Katz *Cc:* vuf...@li...; sol...@go... *Subject:* Re: [VuFind-Tech] [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi Leila, On the Java version, in the very first message $JAVA is set to /usr/lib/jvm/default-java/bin/java. I would guess that one of those directories is a symlink to a specific installed JVM distribution, so you might "ls -ld" each level. There should be a way to change the default through whatever package manager controls your Java installation. Without having to chase that down, you could try explicitly setting JAVA_HOME to the base directory of your Java 1.8 distro and see if that makes a difference. Best, -Tod On Mar 25, 2017, at 7:27 AM, Demian Katz <dem...@vi...> wrote: Can you easily split the offending files into chunks? I'd be interested to see if there is a particular chunk size that always works for this records, or if by splitting the files you can narrow down to a particular run of records that are related to the problem. I realize that the facts that the ID is always different and that larger files work correctly argue against a particular offending record and a particular size limit, but I still think the chunking approach might provide some additional clues.... - Demian ------------------------------ *From:* sol...@go... <sol...@go...> on behalf of Leila Gonzales <lm...@ag...> *Sent:* Saturday, March 25, 2017 4:25 AM *To:* sol...@go...; vuf...@li... *Subject:* [solrmarc-tech] RE: VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi everyone, I just wanted to report back that I’ve found that the issue is with two files I have, and not with the location.bsh/indexing routines, so that’s the good news. For some reason, only two of my .mrc files, are having the issue. One is 27,000 records and the other is 93,000 records. However, I am able to index another mrc file with coordinate data that has ~200,000 records and have no issues with it. Furthermore, the indexing problems for these two files only occur when I try to index the coordinate field. I’ve checked the data in the coordinate fields, but there is nothing special in terms of odd coordinate pairs or typos, etc. All of the coordinate combinations we use in these files have been successfully indexed in other files on VuFind 3.1.3. Also, indexing goes fine. It’s when the records are sent to Solr for the commit stage is when the out of memory errors happen. There also is no consistent failure point… the file stops indexing at different sets of records each time, so that makes it very difficult to say which record or set of records is causing the issue. I’ve also tried the following – all to no avail: 1. Change the -Dsolrmarc.indexer.chunksize option – tried 1, 5, 50, 100, 500 2. Change the autoCommit time to 60 sec 3. Change the number of threads -Dsolrmarc.solrj.threadcount=8 4. Upgraded JVM/Java to 8 (java version "1.8.0_121": Java(TM) SE Runtime Environment (build 1.8.0_121-b13) – the issue here is that I can’t get Solr to recognize the Java upgrade so it still points to the Java 7 instance. The error I’m consistently getting is (except that the document id is never the same one!): 2017-03-25 07:49:05.143 ERROR (qtp1385340628-14) [ x:biblio] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 120336 to the index; possible analysis error. …. Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.fst.BytesStore.writeByte(BytesStore.java:91) at org.apache.lucene.util.fst.FST.<init>(FST.java:295) at org.apache.lucene.util.fst.Builder.<init>(Builder.java:172) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock.compileIndex(BlockTreeTermsWriter.java:594) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:775) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.pushTerm(BlockTreeTermsWriter.java:1085) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.write(BlockTreeTermsWriter.java:1046) at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.write(PerFieldPostingsFormat.java:198) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:107) at org.apache.lucene.index.DefaultIndexingChain.flush(DefaultIndexingChain.java:126) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:422) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:503) at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:357) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:436) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1086) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:709) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:260) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:145) Uwe, you mentioned sending along the solr.log and solr_gc.log. I’m happy to send that to you off-list if you would have a chance to look at them. Thanks again everyone for any help you can provide or suggestions for where I should look next. Kind regards, Leila *From:* Robert Haschart [mailto:rh...@vi...] *Sent:* Thursday, March 23, 2017 11:06 AM *To:* sol...@go...; Leila Gonzales; vuf...@li... *Subject:* Re: [solrmarc-tech] Re: [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Weighing in on this part of Demian's message. I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). The number that is sent in a chunk is 640 records. You can set the system property that controls this value on the command line thusly: -Dsolrmarc.indexer.chunksize=100 that system property (and two others that weren't documented) are now on the Wiki page Demian pointed to. -Bob Haschart On 3/23/2017 8:07 AM, Demian Katz wrote: Leila, I'm copying this to solrmarc-tech in case Bob has anything to add. The biggest difference between the SolrMarc in VuFind 2.4.1 and the SolrMarc in 3.1.3 is that the newer version posts more documents to Solr all at once, while the older version took a "one at a time" approach. I would guess that receiving large batches of records may be contributing to the memory error, though as Uwe said in his reply to you, I wouldn't expect 27,808 records to cause a problem. I've successfully indexed hundreds of thousands. Of course, if any of these records are extremely large and complex, that could make a difference. It might be interesting to see if using SolrMarc's multi-threaded mode makes any difference -- I wonder if tweaking the thread settings could make a difference. You can read about thread options here: https://github.com/solrmarc/solrmarc/wiki/Other-command-line-options <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsolrmarc%2Fsolrmarc%2Fwiki%2FOther-command-line-options&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cd94526e03f0448e84d9e08d473589542%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636260271692218852&sdata=X85kE4Mhn62kphWxxjFAyFWyOE6nlBdAl7MuTk2OpRc%3D&reserved=0> You can use these options through the EXTRA_SOLRMARC_SETTINGS environment variable -- e.g. EXTRA_SOLRMARC_SETTINGS="-Dsolrmarc.solrj.threadcount=1 -Dsolrmarc.indexer.threadcount=2" ./import-marc.sh ... (Note that the values in that example aren't a suggestion, just an example). I'm not sure if there is currently a way to adjust how many records are batched together before sending to Solr. Bob might be able to comment on that (and if it's possible, that would be a useful addition to the "other command-line options" wiki page). Good luck! Let us know if you need more help! - Demian ------------------------------ *From:* Leila Gonzales <lm...@ag...> <lm...@ag...> *Sent:* Thursday, March 23, 2017 12:09 AM *To:* vuf...@li... *Subject:* [VuFind-Tech] VuFind 2.4.1 to 3.1.3: Indexing fails with Java heap space / out of memory errors Hi all, I'm upgrading from VuFind 2.4.1 to 3.1.3, and on indexing my .mrc files, I am running into Java heap memory errors. The file I'm loading has only 27,808 records in it, and I had no problem indexing it with VuFind 2.4.1, so I’m wondering what’s causing the issue since upgrading to 3.1.3. >From what I can tell, it seems that the import-marc.sh script is dying on the commit stage. Now Importing /incoming/processed/ importrecords.mrc ... Mar 22, 23:43:55 /usr/lib/jvm/default-java/bin/java -Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0 -Duser.timezone=UTC -jar /usr/local/vufind/import/solrmarc_core_3.0.6.jar /usr/local/vufind/local/import/import.properties -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/importrecords.mrc 0 [main] DEBUG org.solrmarc.driver.ConfigDriver - Using config /usr/local/vufind/local/import/import.properties to initialize SolrMarc 3 [main] DEBUG org.solrmarc.tools.PropertyUtils - Opening file: /usr/local/vufind/local/import/import.properties 9 [main] INFO org.solrmarc.driver.ConfigDriver - Effective Command Line is: 10 [main] INFO org.solrmarc.driver.ConfigDriver - java -jar solrmarc_core.jar IndexDriver -reader_opts import.properties -dir /usr/local/vufind/local/import|/usr/local/vufind/import;local/import -config "marc.properties, marc_local.properties" -solrURL http://localhost:8080/solr/biblio/update -solrj /usr/local/vufind/solr/vendor/dist/solrj-lib /incoming/processed/ importrecords.mrc INFO [main] (ValueIndexerFactory.java:116) - Using directory: /usr/local/vufind/import/index_java as location of java sources INFO [main] (PropertyUtils.java:313) - Opening file (instead of 2 other options): /usr/local/vufind/local/import/import.properties DEBUG [main] (SolrCoreLoader.java:80) - Found Solrj class org.apache.solr.client.solrj.impl.HttpSolrClient INFO [main] (IndexDriver.java:165) - Reading and compiling index specifications: marc.properties, marc_local.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc.properties INFO [main] (IndexDriver.java:229) - Opening index spec file: /usr/local/vufind/import/marc_local.properties DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: crrelDbaseName.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefFormat.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefPublisher.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefContainerInfo.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefKeywordTerms.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefCategoryCodes.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefNote.bsh DEBUG [main] (ScriptValueExtractorFactory.java:41) - Load bean shell script: georefDOIURL.bsh INFO [main] (IndexDriver.java:93) - Opening input files: [/incoming/processed/importrecords.mrc] INFO [main] (ThreadedIndexer.java:221) - Done with all indexing, finishing writing records to solr ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:421) - Failed on single doc with id : 143179 ERROR [SolrUpdateOnError_ 143170_ 143191] (Indexer.java:431) - Error from server at http://localhost:8080/solr/biblio: Exception writing document id 143179 to the index; possible analysis error. The errors I am getting in the SolrAdmin UI logs are: org.apache.solr.common.SolrException: Exception writing document id 143179 to the index; possible analysis error. ... Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed ... Caused by: java.lang.OutOfMemoryError: Java heap space I've tried modifying the following files, but nothing has worked so far: solr.sh: set SOLR_HEAP to 2G import-marc.sh: set INDEX_OPTIONS='-Xms2G -Xmx2G -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5 -DentityExpansionLimit=0' (My previous settings for VuFind 2.4.1 for vufind.sh were JAVA_OPTIONS="-server -Xms1024m -Xmx1024m -XX:+UseParallelGC -XX:NewRatio=5", and I tried setting the Xms / Xmx to 1024M in import-marc.sh and solr.sh, but to no avail. ) And my current settings are: Using Solr root directory: /usr/local/vufind/solr/vendor Using Java: /usr/lib/jvm/default-java/bin/java java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) Backing up /usr/local/vufind/solr/vufind/logs/solr.log Backing up /usr/local/vufind/solr/vufind/logs/solr_gc.log Starting Solr using the following settings: JAVA = /usr/lib/jvm/default-java/bin/java SOLR_SERVER_DIR = /usr/local/vufind/solr/vendor/server SOLR_HOME = /usr/local/vufind/solr/vufind SOLR_HOST = SOLR_PORT = 8080 STOP_PORT = 7080 JAVA_MEM_OPTS = -Xms2G -Xmx2G GC_TUNE = -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:CMSFullGCsBeforeCompaction=1 -XX:CMSTriggerPermRatio=80 GC_LOG_OPTS = -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/usr/local/vufind/solr/vufind/logs/solr_gc.log SOLR_TIMEZONE = UTC SOLR_OPTS = -Xss256k SOLR_ADDL_ARGS = -Dsolr.log=/usr/local/vufind/solr/vufind/logs Any ideas on what to do next? Should I try adjusting the GC_TUNE settings in solr/vendor/bin/solr.in.sh <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsolr.in.sh&data=02%7C01%7Cdemian.katz%40villanova.edu%7C3e512e60b0aa4e41fd5c08d471a605ac%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636258405253037300&sdata=eRHVAJK7VkC%2BxO%2B9LIs9jayrQ80aCHKHyHw8J1Q6jIE%3D&reserved=0>? Is there somewhere else I should be looking? Thanks for any guidance you can send my way. Kind regards, Leila -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fgroup%2Fsolrmarc-tech&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cd94526e03f0448e84d9e08d473589542%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636260271692218852&sdata=VWEEk92LmxSRGhTJpaXXj6n0tevmk0SDhoVtDNrSzSs%3D&reserved=0> . For more options, visit https://groups.google.com/d/optout <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cd94526e03f0448e84d9e08d473589542%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636260271692218852&sdata=1JpS9FkPCg9f%2BXM%2BnUlAURiWgHhTiwcmf7pBFYQ2A3U%3D&reserved=0> . -- You received this message because you are subscribed to the Google Groups "solrmarc-tech" group. To unsubscribe from this group and stop receiving emails from it, send an email to sol...@go.... To post to this group, send email to sol...@go.... Visit this group at https://groups.google.com/group/solrmarc-tech <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fgroup%2Fsolrmarc-tech&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cd94526e03f0448e84d9e08d473589542%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636260271692218852&sdata=VWEEk92LmxSRGhTJpaXXj6n0tevmk0SDhoVtDNrSzSs%3D&reserved=0> . For more options, visit https://groups.google.com/d/optout <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=02%7C01%7Cdemian.katz%40villanova.edu%7Cd94526e03f0448e84d9e08d473589542%7C765a8de5cf9444f09cafae5bf8cfa366%7C0%7C0%7C636260271692218852&sdata=1JpS9FkPCg9f%2BXM%2BnUlAURiWgHhTiwcmf7pBFYQ2A3U%3D&reserved=0> . ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org <http://slashdot.org/>! http://sdm.link/slashdot_______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech |