From: Bill D. <bi...@du...> - 2008-12-11 14:25:09
|
I recently did a solrmarc import of our catalog, about 5 million records. I was at work for the first 800k or so and saw speeds that I'd seen with smaller files -- 400 records/second or so. Then I left. The whole run eventually ran at about 25 records/second. So....slower. I understand that merging and updating can take longer as the index gets larger, but does that seem...weird? And if not, is there anything I can do to mitigate the effect? -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library |
From: Barnett, J. <jef...@ya...> - 2008-12-11 16:31:37
|
Yale and Stanford (8M, 6M) have had similar experience and mitigated somewhat by tweaking solrconfig to use bigger merge factors and buffer sizes to reduce disk IO. Also keep the number of intermediate commits low as well and run the load offline from the production index. There are suggestions in sol...@lu... on using "shards" to do parallel loads, but I haven't heard of anyone in solrmarc land usinig that technique. -----Original Message----- From: Bill Dueber [mailto:bi...@du...] Sent: Thursday, December 11, 2008 9:25 AM To: vuf...@li... Subject: [VuFind-Tech] Solrmarc indexing speed -- starts out fast, then fades I recently did a solrmarc import of our catalog, about 5 million records. I was at work for the first 800k or so and saw speeds that I'd seen with smaller files -- 400 records/second or so. Then I left. The whole run eventually ran at about 25 records/second. So....slower. I understand that merging and updating can take longer as the index gets larger, but does that seem...weird? And if not, is there anything I can do to mitigate the effect? -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech |
From: Greg P. <pen...@us...> - 2008-12-11 22:31:36
|
Hey Bill, Do you (or a sysadmin) have monitoring software that can look at memory consumption and swap space on your server? It would be interesting to see if it was a constant decay in performance as the index grew or your import hit a wall caused by an expanding memory footprint for solrmarc, either on the hardware or in the JVM. Of course you can perform a limited test of the first theory by importing some new records in a small sample on top of the existing 5mil+ records. Greg Pendlebury Electronic Services Officer (Systems Team) Division of Academic Information Services University of Southern Queensland Phone: +61 7 4631 1501 Fax: +61 7 4631 1841 -----Original Message----- From: Bill Dueber [mailto:bi...@du...] Sent: Friday, 12 December 2008 12:25 AM To: vuf...@li... Subject: [VuFind-Tech] Solrmarc indexing speed -- starts out fast, then fades I recently did a solrmarc import of our catalog, about 5 million records. I was at work for the first 800k or so and saw speeds that I'd seen with smaller files -- 400 records/second or so. Then I left. The whole run eventually ran at about 25 records/second. So....slower. I understand that merging and updating can take longer as the index gets larger, but does that seem...weird? And if not, is there anything I can do to mitigate the effect? -Bill- -- Bill Dueber Library Systems Programmer University of Michigan Library ------------------------------------------------------------------------ ------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix. com/ _______________________________________________ Vufind-tech mailing list Vuf...@li... https://lists.sourceforge.net/lists/listinfo/vufind-tech This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government (CRICOS Institution Code No's. QLD 00244B / NSW 02225M) |
From: Naomi D. <nd...@st...> - 2008-12-12 19:41:59
|
Bill, Here's what we have in our solrconfig.xml <indexDefaults> <useCompoundFile>false</useCompoundFile> <mergeFactor>20</mergeFactor> <ramBufferSizeMB>10240</ramBufferSizeMB> <maxMergeDocs>2147483647</maxMergeDocs> <writeLockTimeout>1000</writeLockTimeout> <commitLockTimeout>10000</commitLockTimeout> <lockType>single</lockType> <maxFieldLength>10000</maxFieldLength> </indexDefaults> <mainIndex> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>10240</ramBufferSizeMB> <mergeFactor>20</mergeFactor> <maxMergeDocs>2147483647</maxMergeDocs> <maxFieldLength>10000</maxFieldLength> <unlockOnStartup>false</unlockOnStartup> </mainIndex> And here's the java options we use on indexing: java -Xmx12g -Xms12g Also, I have gotten rid of "unnecessary" stored fields and trimmed allFields down. - Naomi On Dec 11, 2008, at 6:25 AM, Bill Dueber wrote: > I recently did a solrmarc import of our catalog, about 5 million > records. I was at work for the first 800k or so and saw speeds that > I'd seen with smaller files -- 400 records/second or so. Then I left. > > The whole run eventually ran at about 25 records/second. So....slower. > > I understand that merging and updating can take longer as the index > gets larger, but does that seem...weird? And if not, is there anything > I can do to mitigate the effect? > > -Bill- > > -- > Bill Dueber > Library Systems Programmer > University of Michigan Library > > ------------------------------------------------------------------------------ > SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, > Nevada. > The future of the web can't happen without you. Join us at MIX09 to > help > pave the way to the Next Web now. Learn more and register at > http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech Naomi Dushay nd...@st... |