Menu

indexing performance and freezing

Help
2011-10-18
2013-04-15
  • ivan goncharov

    ivan goncharov - 2011-10-18

    Hello.
    I investigated influence index parameters on indexing speed and found one interesting thing.

    I hope you can explain  my results.

    Example of creating index command.

    CREATE INDEX D11$LDI$ID ON D_LST_11(ID) INDEXTYPE IS lucene.LuceneIndex PARAMETERS( 'Stemmer:Russian;LogLevel:INFO;IndexOnRam:true;AutoTuneMemory:false;MergeFactor:200;MaxBufferedDocs:16000;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING;FormatCols:LAST_CHK_ID(000000000000000000000000000);ExtraCols:LAST_CHK_ID,C_408,C_412,C_413,C_414,C_417,C_418,C_419,C_420,C_421,C_422,C_423,C_428,C_430,C_431,C_432,C_433');
    

    Due to testing I jast was changing IndexOnRam, AutoTuneMemory, MergeFactor, MaxBufferedDocs parameters.

    I used LDI 3.0.2.1.0 and Oracle 11.2.0.1 for Win.

    When I creating index in log i see "init" string:

    IW 4 [Root Thread]: setInfoStream: dir=org.apache.lucene.store.OJVMDirectory@6f2ec910 mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@fe44523b mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@a52a7537 [b]ramBufferSizeMB=16.0[/b] maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=
    

    As for me most imported is "ramBufferSizeMB=16.0"
    So, as i understood, lucene by default want to use 16 Mb or RAM for Buffer
    Then i turned off AutoTuneMemory and set up a different amount of MaxBufferedDocs.
    When current MaxBufferedDocs take a less than 16Mb in RAM then indexing procedure read exactly MaxBufferedDocs rows from table

    *** 2011-10-17 17:25:56.440
      IW 0 : flush: now pause all indexing threads
      IW 0 :   flush: segment=_1 docStoreSegment=_0 docStoreOffset=8000 flushDocs=true flushDeletes=false flushDocStores=false numDocs=8000 numBufDelTerms=0
      IW 0 :   index before flush _0:C8000->_0
      IW 0 : DW: flush postings as segment _1 numDocs=8000
      *** 2011-10-17 17:25:57.596
      IW 0 : DW:   oldRAMSize=15047680 newFlushedSize=2993747 docs/MB=2,802.043 new/old=19.895%
      IFD : now checkpoint "segments_1"
      IFD : now checkpoint "segments_1"
      IW 0 : LMP: findMerges: 2 segments
      IW 0 : LMP:   level 3.1123598 to 3.2381074: 2 segments
      IW 0 : CMS: now merge
      IW 0 : CMS:   index: _0:C8000->_0 _1:C8000->_0
      IW 0 : CMS:   no more merges pending; now return

    But when MaxBufferedDocs take a more than 16Mb, then indexing procedure read so much documents so much keep in 16 of RAM.

    *** 2011-10-18 11:53:40.877
      IW 4 : DW:   RAM: now flush @ usedMB=16.009 allocMB=16.072 deletesMB=0 triggerMB=16
      IW 4 : flush: now pause all indexing threads
      IW 4 :   flush: segment=_1 docStoreSegment=_0 docStoreOffset=9088 flushDocs=true flushDeletes=false flushDocStores=false numDocs=9131 numBufDelTerms=0
      IW 4 :   index before flush _0:C9088->_0
      IW 4 : DW: flush postings as segment _1 numDocs=9131
      *** 2011-10-18 11:53:42.002
      IW 4 : DW:   oldRAMSize=16786432 newFlushedSize=3370661 docs/MB=2,840.555 new/old=20.08%
      IFD : now checkpoint "segments_1"
      IFD : now checkpoint "segments_1"
      IW 4 : LMP: findMerges: 2 segments
      IW 4 : LMP:   level 2.7051885 to 2.8368666: 2 segments
      IW 4 : CMS: now merge
      IW 4 : CMS:   index: _0:C9088->_0 _1:C9131->_0
      IW 4 : CMS:   no more merges pending; now return

     
    And last experiment. When i turned on AutoTuneMemory, then indexing procedure start use 50% of JAVA_PULL_SIZE, for me it is 51Mb.

    ** 2011-10-18 10:57:58.471
      IW 3 : DW:   RAM: now flush @ usedMB=51.002 allocMB=51.065 deletesMB=0 triggerMB=51
      IW 3 : flush: now pause all indexing threads
      IW 3 :   flush: segment=_1 docStoreSegment=_0 docStoreOffset=37459 flushDocs=true flushDeletes=false flushDocStores=false numDocs=37512 numBufDelTerms=0
      IW 3 :   index before flush _0:C37459->_0
      IW 3 : DW: flush postings as segment _1 numDocs=37512
      *** 2011-10-18 10:58:02.830
      IW 3 : DW:   oldRAMSize=53479424 newFlushedSize=11689904 docs/MB=3,364.799 new/old=21.859%
      IFD : now checkpoint "segments_1"
      IFD : now checkpoint "segments_1"
      IW 3 : LMP: findMerges: 2 segments
      IW 3 : LMP:   level 2.7051885 to 3.071586: 2 segments
      IW 3 : CMS: now merge
      IW 3 : CMS:   index: _0:C37459->_0 _1:C37512->_0
      IW 3 : CMS:   no more merges pending; now return

    Other interesting thing, it's that the best performance is when indexing procedure use full 16Mb, but not more.
    But in other side, for some tables numDocs can be 100-150 instead of 10000 - 16000.
    When i start indexing table where just 150-160 rows have put into 16Mb segment in RAM

    CREATE INDEX TT$LDI$ID ON TEST_TEST(ID) INDEXTYPE IS lucene.LuceneIndex
      PARAMETERS( 'Stemmer:Russian;LogLevel:INFO;AutoTuneMemory:false;MergeFactor:200;MaxBufferedDocs:16000;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING;ExtraCols:TEXT0,TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7,TEXT8,TEXT9');

    I have gotten next, first merge (after 200-th segment) has finished without any trouble. But after next 200 segments merge hasn't started and on 400-th segments i have gotten next:

    *** 2011-10-18 13:41:55.096
      IW 1 : DW:   oldRAMSize=16871424 newFlushedSize=3230188 docs/MB=49.017 new/old=19.146%
      IFD : now checkpoint "segments_1"
      IFD : now checkpoint "segments_1"
      IW 1 : LMP: findMerges: 401 segments
      IW 1 : LMP:   level 3.0416162 to 3.7916162: 1 segments
      IW 1 : LMP:   level 2.7051885 to 2.82961: 400 segments
      IW 1 : LMP:     1 to 201: add this merge
      IW 1 : LMP:     201 to 401: add this merge
      IW 1 : add merge to pendingMerges: _b6:C157->_0 _b7:C155->_0 _b8:C154->…a long monotonous string…->_0 _go:C158->_0 _gp:C151->_0
      IW 1 : CMS: now merge
      IW 1 : CMS:   index: _5k:C31169->_0 _5l:C158->_0 _5m:C155->…a long monotonous string…->_0 _gl:C155->_0 _gm:C152->_0 _gn:C160->_0 _go:C158->_0 _gp:C151->_0
      IW 1 : CMS:    too many merge threads running; stalling…

    after this indexing held and session  froze.

    So, my questions are:
     

    • What is mean this 16Mb? can i change this parameter? or how does it calculated?
       
    • Is it true, that 16Mb buffer is the best for performance? or is it just a coincident?
       
    • Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
       
    • How i can avoid freezing of indexing process?

    P.S. Another bug which i found due the testing.
    I can't add maxFieldLength into parameters.
    For example:

    CREATE INDEX D11$LDI$ID ON D_LST_11(ID) INDEXTYPE IS lucene.LuceneIndex
      PARAMETERS( 'Stemmer:Russian;LogLevel:INFO;AutoTuneMemory:false;maxFieldLength:20000;MergeFactor:200;MaxBufferedDocs:16000;LobStorageParameters:PCTVERSION 0 ENABLE STORAGE IN ROW CHUNK 32768 CACHE READS FILESYSTEM_LIKE_LOGGING;FormatCols:LAST_CHK_ID(000000000000000000000000000);ExtraCols:LAST_CHK_ID,C_408,C_412,C_413,C_414,C_417,C_418,C_419,C_420,C_421,C_422,C_423,C_428,C_430,C_431,C_432,C_433')

    ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine

     
  • Marcelo F. Ochoa

    Hi:
       Follwing the answers:
    * What is mean this 16Mb? can i change this parameter? or how does it calculated?
      When AutoTuneMemory is true, this value is calculated using the 50% of the value returned by the OVJM call OracleRuntime.getJavaPoolSize() which is equal to init.ora parameter java_pool_size.
      We choose the 50% because during merge is normal that other Lucene structures fills more RAM that the defined by index writer setRAMBufferSizeMB parameter.
    Is it true, that 16Mb buffer is the best for performance? or is it just a coincident?
      If you change the SGA configuration this values will change.
    * Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
      If the process complete freeze and no CPU usage is reported by the Oracle process may be is a bug, if your Oracle process gets 100% of CPU usage is normal because Lucene merges process are quite expensive in terms of complexity.
    * How i can avoid freezing of indexing process?
      Increasing the SGA size should be fix this problem, 16Mb for indexing purpose is a little value.
    maxFieldLength should be MaxFieldLength all parameters start with upper case letters.
    On the other hand Lucene project was moved to ldi.sourceforget.net
    You can find a full documentation site at:
    http://ludoix.wordpress.com/
    specially the appendix http://ludoix.wordpress.com/2011/03/11/ldi-docs-appendix-a-parameter-reference-and-syntax/
    There is a new version using lucene-3.3.0 core base available for download using SVN, I'll add a binary version ASAP.
    Best regards, Marcelo.

     
  • ivan goncharov

    ivan goncharov - 2011-10-18

    Thanks for fast answer.

    *What is mean this 16Mb? can i change this parameter? or how does it calculated?
    But i turned off AutoTuneMemory.  When I turned it on I got a 50% of java pool size (for me it 50Mb), as you said. But when i turned it off i got just 16Mb. but anyway i'll try to increase SGA and chk again.

    *Why indexing freezes after has indexed MergeFactor*3 segments? is it bug or i did something wrong?
      No, CPU didn't use. I'll try chk on other version of oracle.
      So, is it right, that merge should start after each "MergeFactor" segments? because I don't have it.

    * unfortunately MaxFieldLength didn't  work too

    *** 2011-10-18 16:25:55.643
    Oct 18, 2011 4:25:55 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
    WARNING: Not validad parameter name: MaxFieldLength
    
     
  • ivan goncharov

    ivan goncharov - 2011-10-19

    * I checked freezing with the same versions of Oracle DB (11.2.0.1.0) and Lucene Domain Index (3.0.2.1.0) and all work fine without any freezing. So I guess it can be  bug in Oracle Windows version.

    * On new DB I have 800MB of SGA (before was 1300) but anyway if i turned off AutoTuneMemory, then Lucene read to buffer not more 16MB

     
  • ivan goncharov

    ivan goncharov - 2011-10-19

    * I checked freezing with the same versions of Oracle DB (11.2.0.1.0) and Lucene Domain Index (3.0.2.1.0) and all work fine without any freezing. So I guess it can be bug in Oracle Windows version.

    I am sorry, it not absolutely right.When I decreased rows amount on windows DB (from 100K to 8K) indexing finished fine (MergeFactor set up to 10 and AutoTuneMemory set to off, that give me 150 rows in one segment). But each new merge contained one additional segment, i.e. 10, 11, 12, 13,…

     
  • Marcelo F. Ochoa

    Hi:
      Let me check with old source versions.
      If you want to test latest binary version based on Lucene 3.2.x stack just drop me an email to marcelo.ochoa-at-gmail.com
      Another LDI commiter already checked latest development version on windows with sucess.
      Best regards, Marcelo.

     
  • ivan goncharov

    ivan goncharov - 2011-10-19

    Hi, thanks for help.
    Yes, i would like get the binaries to test, because for me java is new technology and i am afraid source compilations may be difficult for me.
    Bat i thought, for the first check all with my current version, because i could have made a mistake, and only then setup new version of source. so now i check the same in DB on linux  and with a big table.

    BTW, in both DB i have seen errors

    kewastUnPackStats(): bad magic 1 (0x1B74D34C, 0)
    

    in alert log.
    As i understand it bug fix in new version of oracle (11.2.0.2.0), so i will plane to check it too.

     
  • ivan goncharov

    ivan goncharov - 2011-10-20

    Hello.
    I have been continued may investigation.

    I setup DB 11.2.0.2. for Win and install there lucene 3.3.0.1.0, which you send me yesterday. Additional i set up new source to oracle 11.2.0.1. Freeze for now was absent.
    Second good news - MaxFieldLength parameter work with new Lucnen version.
    But anywhere i can't  got buffer size more then 16MB when i turned off AutoTuneMemory.
    So, i guess, i must use only AutoTuneMemory parameter set up to true, if i want good performance.
    Above is full lucene log (look on ramBufferSizeMB parameter), maybe you will be able find something interesting.

    *** 2011-10-20 19:27:21.555
    Oct 20, 2011 6:27:21 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
    INFO: Indexing column: 'ID'
    Oct 20, 2011 6:27:21 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
    INFO: Internal Parameters (DEV_TEST.TT$LDI$ID):
    TableSchema:DEV_TEST;TableName:TEST_TEST;Partition:null;TypeName:NUMBER;ColName:ID;PopulateIndex:false
    Oct 20, 2011 6:27:21 PM org.apache.lucene.indexer.LuceneDomainIndex createLuceneStore
    INFO: .prepareCall 'call LuceneDomainIndex.createTable(?,?)'
    Oct 20, 2011 6:27:21 PM org.apache.lucene.indexer.LuceneDomainIndex createLuceneStore
    INFO: .setString 'DEV_TEST.TT$LDI$ID'
    Oct 20, 2011 6:27:21 PM org.apache.lucene.indexer.LuceneDomainIndex createLuceneStore
    INFO: .setString 'PCTVERSION 0 DISABLE STORAGE IN ROW CACHE READS NOLOGGING'
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.LuceneDomainIndex createLuceneStore
    INFO: .prepareCall 'call LuceneDomainAdm.createQueue(?)'
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.LuceneDomainIndex createLuceneStore
    INFO: .setString 'DEV_TEST.TT$LDI$ID'
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
    INFO:  ExtraTabs: null WhereCondition: null LockMasterTable: true
    *** 2011-10-20 19:27:22.461
    IFD [Thu Oct 20 18:27:22 GMT+03:00 2011; Root Thread]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@9cdb3576
    IW 0 [Thu Oct 20 18:27:22 GMT+03:00 2011; Root Thread]: 
    dir=org.apache.lucene.store.OJVMDirectory@d705b660 lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@607a0330
    index=
    version=3.3-SNAPSHOT
    matchVersion=LUCENE_30
    analyzer=org.apache.lucene.analysis.snowball.SnowballAnalyzer
    delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
    commit=null
    openMode=CREATE
    similarity=org.apache.lucene.search.DefaultSimilarity
    termIndexInterval=128
    mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler
    default WRITE_LOCK_TIMEOUT=1000
    writeLockTimeout=1000
    maxBufferedDeleteTerms=-1
    [b]ramBufferSizeMB=16.0[/b]
    maxBufferedDocs=500
    mergedSegmentWarmer=null
    mergePolicy=[LogDocMergePolicy: minMergeSize=1000, mergeFactor=10, maxMergeSize=9223372036854775807, maxMergeSizeForOptimize=9223372036854775807, calibrateSizeByDeletes=true, maxMergeDocs=2147483647, useCompoundFile=false, noCFSRatio=0.1]
    maxThreadStates=8
    readerPooling=false
    readerTermsIndexDivisor=1
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.LuceneDomainIndex getUserDataStore
    INFO: UserDataStoreClass=org.apache.lucene.indexer.DefaultUserDataStore
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.LuceneDomainIndex ODCIIndexCreate
    INFO: ExtraCols: TEXT0,TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7,TEXT8,TEXT9 User Data Store: org.apache.lucene.indexer.DefaultUserDataStore@391b9d30
    Oct 20, 2011 6:27:22 PM org.apache.lucene.indexer.TableIndexer index
    INFO: index(Logger logger, IndexWriter writer, String col, boolean withMasterColumn), Performing:
    SELECT /*+ DYNAMIC_SAMPLING(L$MT,0) RULE NOCACHE(L$MT) */ L$MT.rowid,L$MT."ID",TEXT0,TEXT1,TEXT2,TEXT3,TEXT4,TEXT5,TEXT6,TEXT7,TEXT8,TEXT9 FROM DEV_TEST.TEST_TEST L$MT for update nowait
    

    And the last. I very often (50% or more) get a ORA-600 in testMasterDetail test case. I it's interesting for you I can open a new topic on ldi project page about it problem.

     
  • Marcelo F. Ochoa

    Hi:
       I followed the code of Lucene IndexWriterConfig object and by setting only MaxBufferedDocs cause that it set by default 16Mb as limit of the RAM usage.
      I patched the code into the SVN version which now is tested with Lucene 3.4.0, now defining MaxBufferedDocs also defines ramBufferSizeMB as the value returned by OracleRuntime.getJavaPoolSize().
      But I recomend to use AutoTuneMemory:true which is safer and easy to setup. Latest code base also is using TieredMergePolicy class which is faster than the previous one LogBySizeMergePolicy.
      AutoTuneMemory:true prevents OOM condition when a batch of rows is bigger than the RAM reported by getJavaPoolSize, for example if first 100.000 rows of the table fits in 64Mb RAM is not a guaranty that next 100.000 rows can fit in the same size specially if they differs a lot in the size or the dispersion of the text being indexed,
       Regarding to the ora-600 error please open a new thread at LDI project support forum:
    https://sourceforge.net/projects/ldi/forums/forum/1388516
       Best regards, Marcelo.

     
  • ivan goncharov

    ivan goncharov - 2011-10-21

    Hi.
    Thanks for your help. For now it's more clearly for me.
    I think that use auto settings will be better, too.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.