As for me most imported is "ramBufferSizeMB=16.0"
So, as i understood, lucene by default want to use 16 Mb or RAM for Buffer
Then i turned off AutoTuneMemory and set up a different amount of MaxBufferedDocs.
When current MaxBufferedDocs take a less than 16Mb in RAM then indexing procedure read exactly MaxBufferedDocs rows from table
*** 2011-10-17 17:25:56.440
IW 0 : flush: now pause all indexing threads
IW 0 : flush: segment=_1 docStoreSegment=_0 docStoreOffset=8000 flushDocs=true flushDeletes=false flushDocStores=false numDocs=8000 numBufDelTerms=0
IW 0 : index before flush _0:C8000->_0
IW 0 : DW: flush postings as segment _1 numDocs=8000
*** 2011-10-17 17:25:57.596
IW 0 : DW: oldRAMSize=15047680 newFlushedSize=2993747 docs/MB=2,802.043 new/old=19.895%
IFD : now checkpoint "segments_1"
IFD : now checkpoint "segments_1"
IW 0 : LMP: findMerges: 2 segments
IW 0 : LMP: level 3.1123598 to 3.2381074: 2 segments
IW 0 : CMS: now merge
IW 0 : CMS: index: _0:C8000->_0 _1:C8000->_0
IW 0 : CMS: no more merges pending; now return
But when MaxBufferedDocs take a more than 16Mb, then indexing procedure read so much documents so much keep in 16 of RAM.
*** 2011-10-18 11:53:40.877
IW 4 : DW: RAM: now flush @ usedMB=16.009 allocMB=16.072 deletesMB=0 triggerMB=16
IW 4 : flush: now pause all indexing threads
IW 4 : flush: segment=_1 docStoreSegment=_0 docStoreOffset=9088 flushDocs=true flushDeletes=false flushDocStores=false numDocs=9131 numBufDelTerms=0
IW 4 : index before flush _0:C9088->_0
IW 4 : DW: flush postings as segment _1 numDocs=9131
*** 2011-10-18 11:53:42.002
IW 4 : DW: oldRAMSize=16786432 newFlushedSize=3370661 docs/MB=2,840.555 new/old=20.08%
IFD : now checkpoint "segments_1"
IFD : now checkpoint "segments_1"
IW 4 : LMP: findMerges: 2 segments
IW 4 : LMP: level 2.7051885 to 2.8368666: 2 segments
IW 4 : CMS: now merge
IW 4 : CMS: index: _0:C9088->_0 _1:C9131->_0
IW 4 : CMS: no more merges pending; now return
And last experiment. When i turned on AutoTuneMemory, then indexing procedure start use 50% of JAVA_PULL_SIZE, for me it is 51Mb.
** 2011-10-18 10:57:58.471
IW 3 : DW: RAM: now flush @ usedMB=51.002 allocMB=51.065 deletesMB=0 triggerMB=51
IW 3 : flush: now pause all indexing threads
IW 3 : flush: segment=_1 docStoreSegment=_0 docStoreOffset=37459 flushDocs=true flushDeletes=false flushDocStores=false numDocs=37512 numBufDelTerms=0
IW 3 : index before flush _0:C37459->_0
IW 3 : DW: flush postings as segment _1 numDocs=37512
*** 2011-10-18 10:58:02.830
IW 3 : DW: oldRAMSize=53479424 newFlushedSize=11689904 docs/MB=3,364.799 new/old=21.859%
IFD : now checkpoint "segments_1"
IFD : now checkpoint "segments_1"
IW 3 : LMP: findMerges: 2 segments
IW 3 : LMP: level 2.7051885 to 3.071586: 2 segments
IW 3 : CMS: now merge
IW 3 : CMS: index: _0:C37459->_0 _1:C37512->_0
IW 3 : CMS: no more merges pending; now return
Other interesting thing, it's that the best performance is when indexing procedure use full 16Mb, but not more.
But in other side, for some tables numDocs can be 100-150 instead of 10000 - 16000.
When i start indexing table where just 150-160 rows have put into 16Mb segment in RAM
CREATE INDEX TT$LDI$ID ON TEST_TEST(ID) INDEXTYPE IS lucene.LuceneIndex
PARAMETERS( 'Stemmer:Russian;LogLevel:INFO;AutoTuneMemory:false;MergeFactor:200;MaxBufferedDocs:16000;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING;ExtraCols:TEXT0,TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7,TEXT8,TEXT9');
I have gotten next, first merge (after 200-th segment) has finished without any trouble. But after next 200 segments merge hasn't started and on 400-th segments i have gotten next:
*** 2011-10-18 13:41:55.096
IW 1 : DW: oldRAMSize=16871424 newFlushedSize=3230188 docs/MB=49.017 new/old=19.146%
IFD : now checkpoint "segments_1"
IFD : now checkpoint "segments_1"
IW 1 : LMP: findMerges: 401 segments
IW 1 : LMP: level 3.0416162 to 3.7916162: 1 segments
IW 1 : LMP: level 2.7051885 to 2.82961: 400 segments
IW 1 : LMP: 1 to 201: add this merge
IW 1 : LMP: 201 to 401: add this merge
IW 1 : add merge to pendingMerges: _b6:C157->_0 _b7:C155->_0 _b8:C154->…a long monotonous string…->_0 _go:C158->_0 _gp:C151->_0
IW 1 : CMS: now merge
IW 1 : CMS: index: _5k:C31169->_0 _5l:C158->_0 _5m:C155->…a long monotonous string…->_0 _gl:C155->_0 _gm:C152->_0 _gn:C160->_0 _go:C158->_0 _gp:C151->_0
IW 1 : CMS: too many merge threads running; stalling…
after this indexing held and session froze.
So, my questions are:
What is mean this 16Mb? can i change this parameter? or how does it calculated?
Is it true, that 16Mb buffer is the best for performance? or is it just a coincident?
Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
How i can avoid freezing of indexing process?
P.S. Another bug which i found due the testing.
I can't add maxFieldLength into parameters.
For example:
CREATE INDEX D11$LDI$ID ON D_LST_11(ID) INDEXTYPE IS lucene.LuceneIndex
PARAMETERS( 'Stemmer:Russian;LogLevel:INFO;AutoTuneMemory:false;maxFieldLength:20000;MergeFactor:200;MaxBufferedDocs:16000;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING;FormatCols:LAST_CHK_ID(000000000000000000000000000);ExtraCols:LAST_CHK_ID,C_408,C_412,C_413,C_414,C_417,C_418,C_419,C_420,C_421,C_422,C_423,C_428,C_430,C_431,C_432,C_433')
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi:
Follwing the answers:
* What is mean this 16Mb? can i change this parameter? or how does it calculated?
When AutoTuneMemory is true, this value is calculated using the 50% of the value returned by the OVJM call OracleRuntime.getJavaPoolSize() which is equal to init.ora parameter java_pool_size.
We choose the 50% because during merge is normal that other Lucene structures fills more RAM that the defined by index writer setRAMBufferSizeMB parameter.
* Is it true, that 16Mb buffer is the best for performance? or is it just a coincident?
If you change the SGA configuration this values will change.
* Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
If the process complete freeze and no CPU usage is reported by the Oracle process may be is a bug, if your Oracle process gets 100% of CPU usage is normal because Lucene merges process are quite expensive in terms of complexity.
* How i can avoid freezing of indexing process?
Increasing the SGA size should be fix this problem, 16Mb for indexing purpose is a little value. maxFieldLength should be MaxFieldLength all parameters start with upper case letters.
On the other hand Lucene project was moved to ldi.sourceforget.net
You can find a full documentation site at: http://ludoix.wordpress.com/
specially the appendix http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/
There is a new version using lucene-3.3.0 core base available for download using SVN, I'll add a binary version ASAP.
Best regards, Marcelo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
*What is mean this 16Mb? can i change this parameter? or how does it calculated?
But i turned off AutoTuneMemory. When I turned it on I got a 50% of java pool size (for me it 50Mb), as you said. But when i turned it off i got just 16Mb. but anyway i'll try to increase SGA and chk again.
*Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
No, CPU didn't use. I'll try chk on other version of oracle.
So, is it right, that merge should start after each "MergeFactor" segments? because I don't have it.
* unfortunately MaxFieldLength didn't work too
*** 2011-10-18 16:25:55.643
Oct 18, 2011 4:25:55 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
WARNING: Not validad parameter name: MaxFieldLength
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
* I checked freezing with the same versions of Oracle DB (11.2.0.1.0) and Lucene Domain Index (3.0.2.1.0) and all work fine without any freezing. So I guess it can be bug in Oracle Windows version.
* On new DB I have 800MB of SGA (before was 1300) but anyway if i turned off AutoTuneMemory, then Lucene read to buffer not more 16MB
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
* I checked freezing with the same versions of Oracle DB (11.2.0.1.0) and Lucene Domain Index (3.0.2.1.0) and all work fine without any freezing. So I guess it can be bug in Oracle Windows version.
I am sorry, it not absolutely right.When I decreased rows amount on windows DB (from 100K to 8K) indexing finished fine (MergeFactor set up to 10 and AutoTuneMemory set to off, that give me 150 rows in one segment). But each new merge contained one additional segment, i.e. 10, 11, 12, 13,…
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi:
Let me check with old source versions.
If you want to test latest binary version based on Lucene 3.2.x stack just drop me an email to marcelo.ochoa-at-gmail.com
Another LDI commiter already checked latest development version on windows with sucess.
Best regards, Marcelo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, thanks for help.
Yes, i would like get the binaries to test, because for me java is new technology and i am afraid source compilations may be difficult for me.
Bat i thought, for the first check all with my current version, because i could have made a mistake, and only then setup new version of source. so now i check the same in DB on linux and with a big table.
BTW, in both DB i have seen errors
kewastUnPackStats(): bad magic 1 (0x1B74D34C, 0)
in alert log.
As i understand it bug fix in new version of oracle (11.2.0.2.0), so i will plane to check it too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I setup DB 11.2.0.2. for Win and install there lucene 3.3.0.1.0, which you send me yesterday. Additional i set up new source to oracle 11.2.0.1. Freeze for now was absent.
Second good news - MaxFieldLength parameter work with new Lucnen version.
But anywhere i can't got buffer size more then 16MB when i turned off AutoTuneMemory.
So, i guess, i must use only AutoTuneMemory parameter set up to true, if i want good performance.
Above is full lucene log (look on ramBufferSizeMB parameter), maybe you will be able find something interesting.
And the last. I very often (50% or more) get a ORA-600 in testMasterDetail test case. I it's interesting for you I can open a new topic on ldi project page about it problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi:
I followed the code of Lucene IndexWriterConfig object and by setting only MaxBufferedDocs cause that it set by default 16Mb as limit of the RAM usage.
I patched the code into the SVN version which now is tested with Lucene 3.4.0, now defining MaxBufferedDocs also defines ramBufferSizeMB as the value returned by OracleRuntime.getJavaPoolSize().
But I recomend to use AutoTuneMemory:true which is safer and easy to setup. Latest code base also is using TieredMergePolicy class which is faster than the previous one LogBySizeMergePolicy.
AutoTuneMemory:true prevents OOM condition when a batch of rows is bigger than the RAM reported by getJavaPoolSize, for example if first 100.000 rows of the table fits in 64Mb RAM is not a guaranty that next 100.000 rows can fit in the same size specially if they differs a lot in the size or the dispersion of the text being indexed,
Regarding to the ora-600 error please open a new thread at LDI project support forum: https://sourceforge.net/projects/ldi/forums/forum/1388516
Best regards, Marcelo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello.
I investigated influence index parameters on indexing speed and found one interesting thing.
I hope you can explain my results.
Example of creating index command.
Due to testing I jast was changing IndexOnRam, AutoTuneMemory, MergeFactor, MaxBufferedDocs parameters.
I used LDI 3.0.2.1.0 and Oracle 11.2.0.1 for Win.
When I creating index in log i see "init" string:
As for me most imported is "ramBufferSizeMB=16.0"
So, as i understood, lucene by default want to use 16 Mb or RAM for Buffer
Then i turned off AutoTuneMemory and set up a different amount of MaxBufferedDocs.
When current MaxBufferedDocs take a less than 16Mb in RAM then indexing procedure read exactly MaxBufferedDocs rows from table
But when MaxBufferedDocs take a more than 16Mb, then indexing procedure read so much documents so much keep in 16 of RAM.
And last experiment. When i turned on AutoTuneMemory, then indexing procedure start use 50% of JAVA_PULL_SIZE, for me it is 51Mb.
Other interesting thing, it's that the best performance is when indexing procedure use full 16Mb, but not more.
But in other side, for some tables numDocs can be 100-150 instead of 10000 - 16000.
When i start indexing table where just 150-160 rows have put into 16Mb segment in RAM
I have gotten next, first merge (after 200-th segment) has finished without any trouble. But after next 200 segments merge hasn't started and on 400-th segments i have gotten next:
after this indexing held and session froze.
So, my questions are:
P.S. Another bug which i found due the testing.
I can't add maxFieldLength into parameters.
For example:
ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine
Hi:
Follwing the answers:
* What is mean this 16Mb? can i change this parameter? or how does it calculated?
When AutoTuneMemory is true, this value is calculated using the 50% of the value returned by the OVJM call OracleRuntime.getJavaPoolSize() which is equal to init.ora parameter java_pool_size.
We choose the 50% because during merge is normal that other Lucene structures fills more RAM that the defined by index writer setRAMBufferSizeMB parameter.
* Is it true, that 16Mb buffer is the best for performance? or is it just a coincident?
If you change the SGA configuration this values will change.
* Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
If the process complete freeze and no CPU usage is reported by the Oracle process may be is a bug, if your Oracle process gets 100% of CPU usage is normal because Lucene merges process are quite expensive in terms of complexity.
* How i can avoid freezing of indexing process?
Increasing the SGA size should be fix this problem, 16Mb for indexing purpose is a little value.
maxFieldLength should be MaxFieldLength all parameters start with upper case letters.
On the other hand Lucene project was moved to ldi.sourceforget.net
You can find a full documentation site at:
http://ludoix.wordpress.com/
specially the appendix http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/
There is a new version using lucene-3.3.0 core base available for download using SVN, I'll add a binary version ASAP.
Best regards, Marcelo.
Thanks for fast answer.
*What is mean this 16Mb? can i change this parameter? or how does it calculated?
But i turned off AutoTuneMemory. When I turned it on I got a 50% of java pool size (for me it 50Mb), as you said. But when i turned it off i got just 16Mb. but anyway i'll try to increase SGA and chk again.
*Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
No, CPU didn't use. I'll try chk on other version of oracle.
So, is it right, that merge should start after each "MergeFactor" segments? because I don't have it.
* unfortunately MaxFieldLength didn't work too
* I checked freezing with the same versions of Oracle DB (11.2.0.1.0) and Lucene Domain Index (3.0.2.1.0) and all work fine without any freezing. So I guess it can be bug in Oracle Windows version.
* On new DB I have 800MB of SGA (before was 1300) but anyway if i turned off AutoTuneMemory, then Lucene read to buffer not more 16MB
I am sorry, it not absolutely right.When I decreased rows amount on windows DB (from 100K to 8K) indexing finished fine (MergeFactor set up to 10 and AutoTuneMemory set to off, that give me 150 rows in one segment). But each new merge contained one additional segment, i.e. 10, 11, 12, 13,…
Hi:
Let me check with old source versions.
If you want to test latest binary version based on Lucene 3.2.x stack just drop me an email to marcelo.ochoa-at-gmail.com
Another LDI commiter already checked latest development version on windows with sucess.
Best regards, Marcelo.
Hi, thanks for help.
Yes, i would like get the binaries to test, because for me java is new technology and i am afraid source compilations may be difficult for me.
Bat i thought, for the first check all with my current version, because i could have made a mistake, and only then setup new version of source. so now i check the same in DB on linux and with a big table.
BTW, in both DB i have seen errors
in alert log.
As i understand it bug fix in new version of oracle (11.2.0.2.0), so i will plane to check it too.
Hello.
I have been continued may investigation.
I setup DB 11.2.0.2. for Win and install there lucene 3.3.0.1.0, which you send me yesterday. Additional i set up new source to oracle 11.2.0.1. Freeze for now was absent.
Second good news - MaxFieldLength parameter work with new Lucnen version.
But anywhere i can't got buffer size more then 16MB when i turned off AutoTuneMemory.
So, i guess, i must use only AutoTuneMemory parameter set up to true, if i want good performance.
Above is full lucene log (look on ramBufferSizeMB parameter), maybe you will be able find something interesting.
And the last. I very often (50% or more) get a ORA-600 in testMasterDetail test case. I it's interesting for you I can open a new topic on ldi project page about it problem.
Hi:
I followed the code of Lucene IndexWriterConfig object and by setting only MaxBufferedDocs cause that it set by default 16Mb as limit of the RAM usage.
I patched the code into the SVN version which now is tested with Lucene 3.4.0, now defining MaxBufferedDocs also defines ramBufferSizeMB as the value returned by OracleRuntime.getJavaPoolSize().
But I recomend to use AutoTuneMemory:true which is safer and easy to setup. Latest code base also is using TieredMergePolicy class which is faster than the previous one LogBySizeMergePolicy.
AutoTuneMemory:true prevents OOM condition when a batch of rows is bigger than the RAM reported by getJavaPoolSize, for example if first 100.000 rows of the table fits in 64Mb RAM is not a guaranty that next 100.000 rows can fit in the same size specially if they differs a lot in the size or the dispersion of the text being indexed,
Regarding to the ora-600 error please open a new thread at LDI project support forum:
https://sourceforge.net/projects/ldi/forums/forum/1388516
Best regards, Marcelo.
Hi.
Thanks for your help. For now it's more clearly for me.
I think that use auto settings will be better, too.