From: <bjm...@mi...> - 2007-02-06 09:26:34
|
Content-type: Multipart/Alternative; boundary="Alt-Boundary-19464.1185019948" --Alt-Boundary-19464.1185019948 Ben, I saw your comment in the clucene-contrib-0.9.14.tar.gz ? thread. On 28th December I sent a message to the list about finding the cause of a memory leak in the Term Vector code. I am not sure if you saw it. The sourceforge list sent me back a copy, but it doesn't appear in the archive. There was no response, and no changes in the SVN trunk, so I have duplicated the message below. There was also a previous message about the 'Read past EOF' exception only happening with empty terms. Did it get lost? Regards, Barry. Copy of 28th December 2006 message follows:- Hi Ben, Further update. Re. Read past EOF exception with empty documents with stored term vectors. This was also reported as a problem with Java Lucene at the end of 2004. The equivalent fix code appears to be present in CLucene, I need to do some further investigation to find why it isn't working. For the time being I have implemented a local hack, catch the exception and set fieldCount to zero. Re, Term Vector memory leak. Just in case you haven't already found this, a leak occurs in the SegmentTermPositionVector destructor (freeing nested arrays). What is happening is that a whole array is being deleted after only its first element is freed. The rest of the entries are leaked. I modified the code as below (see the *** comments) to delete the outer arrays after the loops. Seems to have fixed the leakage. SegmentTermPositionVector::~SegmentTermPositionVector(){ if ( offsets ){ for (size_t i=0;i<offsets->length;i++){ if ( offsets->values != NULL ){ Array<TermVectorOffsetInfo>& offs = offsets->values[i]; for ( size_t j=0;j<offs.length;j++ ){ _CLDELETE_ARRAY(offs.values); } // BJM *** _CLDELETE_ARRAY(offsets->values); } } _CLDELETE_ARRAY(offsets->values); // *** Moved here _CLDELETE(offsets); } if ( positions ){ for (size_t i=0;i<positions->length;i++){ if ( positions->values != NULL ){ Array<int32_t>& pos = positions->values[i]; for ( size_t j=0;j<pos.length;j++ ){ _CLDELETE_ARRAY(pos.values); } // BJM *** _CLDELETE_ARRAY(positions->values); } } _CLDELETE_ARRAY(positions->values); // *** Moved here _CLDELETE(positions); } } HTH Regards, Barry. > Hi Ben, > > Quick update. > > There were silly bugs in the test case I posted, but these did give a > clue to the problem. It appears that the failure only happens on > effectively empty documents, i.e. those of zero length, or containing > just stop words. I don't know if this can be considered a bug, or how > Java Lucene handles a similar situation. Ideally it should just store > a zero length term vector and not abort with exception. > > I am still investigating the above, plus a reported, but not confirmed, > memory leak in the term vector code. I will post another update > when I make further progress. > > Best wishes for the holiday season. > > Barry. > > On 7 Dec 2006 at 11:17, Ben van Klinken wrote: > > > Thanks for the update barry. > > > > ben > > > > On 07/12/06, bjm...@mi... <bjm...@mi...> wrote: > > > Thanks Ben, > > > > > > I have now confirmed the problem also occurs on Linux. Fresh SVN > > > checkout on Fedora Core 5. > > > > > > As the exception stack trace suggests, the failure always seems to > > > happen at segment merge. If I change the 'merge factor' setting > > > from 10 (default) to 12, the test prog fails at document 120 instead > > > of 100. > > > > > > Will post another follow-up if I spot anything else which appears > > > relevant. > > > > > > Barry. > > > > > > ------------------------------------------------------------------------- > > > Take Surveys. Earn Cash. Influence the Future of IT > > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > > opinions on IT & business topics through brief surveys - and earn cash > > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > > _______________________________________________ > > > CLucene-developers mailing list > > > CLu...@li... > > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > > ------------------------------------------------------------------------- > > Take Surveys. Earn Cash. Influence the Future of IT > > Join SourceForge.net's Techsay panel and you'll get the chance to share your > > opinions on IT & business topics through brief surveys - and earn cash > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > > _______________________________________________ > > CLucene-developers mailing list > > CLu...@li... > > https://lists.sourceforge.net/lists/listinfo/clucene-developers > > > > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > CLucene-developers mailing list > CLu...@li... > https://lists.sourceforge.net/lists/listinfo/clucene-developers > --Alt-Boundary-19464.1185019948 <?xml version="1.0" ?><html> <head> <title></title> </head> <body> <div align="left"><font face="Arial"><span style="font-size:10pt">Ben,</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">I saw your comment in the clucene-contrib-0.9.14.tar.gz ? thread.</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">On 28th December I sent a message to the list about finding the</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">cause of a memory leak in the Term Vector code. I am not sure</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">if you saw it. The sourceforge list sent me back a copy, but it</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">doesn't appear in the archive.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">There was no response, and no changes in the SVN trunk, so I</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">have duplicated the message below. There was also a previous message</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">about the 'Read past EOF' exception only happening with empty terms.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Did it get lost?</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Regards,</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">Barry.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Copy of 28th December 2006 message follows:-</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Hi Ben,</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Further update.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Re. Read past EOF exception with empty documents with stored</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">term vectors.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">This was also reported as a problem with Java Lucene at the end of</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">2004. The equivalent fix code appears to be present in CLucene,</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">I need to do some further investigation to find why it isn't working.</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">For the time being I have implemented a local hack, catch the</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">exception and set fieldCount to zero.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Re, Term Vector memory leak.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">Just in case you haven't already found this, a leak occurs in</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">the SegmentTermPositionVector destructor (freeing nested arrays).</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">What is happening is that a whole array is being deleted after only</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">its first element is freed. The rest of the entries are leaked.</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">I modified the code as below (see the *** comments) to delete the</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">outer arrays after the loops. Seems to have fixed the leakage.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">SegmentTermPositionVector::~SegmentTermPositionVector(){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">    if ( offsets ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        for (size_t i=0;i<offsets->length;i++){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">            if ( offsets->values != NULL ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                Array<TermVectorOffsetInfo>& offs = offsets->values[i];</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                for ( size_t j=0;j<offs.length;j++ ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                    _CLDELETE_ARRAY(offs.values);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                // BJM *** _CLDELETE_ARRAY(offsets->values);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">            }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        _CLDELETE_ARRAY(offsets->values);  // *** Moved here</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        _CLDELETE(offsets);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">    }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">    if ( positions ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        for (size_t i=0;i<positions->length;i++){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">            if ( positions->values != NULL ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                Array<int32_t>& pos = positions->values[i];</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                for ( size_t j=0;j<pos.length;j++ ){</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                    _CLDELETE_ARRAY(pos.values);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">                // BJM *** _CLDELETE_ARRAY(positions->values);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">            }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        _CLDELETE_ARRAY(positions->values); // *** Moved here</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">        _CLDELETE(positions);</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">    }</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">}</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial"><span style="font-size:10pt">HTH</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">Regards,</span></font></div> <div align="left"><font face="Arial"><span style="font-size:10pt">Barry.</span></font></div> <div align="left"><br/> </div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Hi Ben,</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Quick update.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> There were silly bugs in the test case I posted, but these did give a</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> clue to the problem. It appears that the failure only happens on</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> effectively empty documents, i.e. those of zero length, or containing</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> just stop words. I don't know if this can be considered a bug, or how</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Java Lucene handles a similar situation. Ideally it should just store</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> a zero length term vector and not abort with exception.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> I am still investigating the above, plus a reported, but not confirmed,</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> memory leak in the term vector code. I will post another update </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> when I make further progress.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Best wishes for the holiday season.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Barry.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> On 7 Dec 2006 at 11:17, Ben van Klinken wrote:</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > Thanks for the update barry.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ben</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > On 07/12/06, bjm...@mi... <bjm...@mi...> wrote:</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > Thanks Ben,</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > I have now confirmed the problem also occurs on Linux. Fresh SVN</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > checkout on Fedora Core 5.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > As the exception stack trace suggests, the failure always seems to</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > happen at segment merge. If I change the 'merge factor' setting</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > from 10 (default) to 12, the test prog fails at document 120 instead</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > of 100.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > Will post another follow-up if I spot anything else which appears</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > relevant.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > Barry.</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > -------------------------------------------------------------------------</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > Take Surveys. Earn Cash. Influence the Future of IT</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > Join SourceForge.net's Techsay panel and you'll get the chance to share your</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > opinions on IT & business topics through brief surveys - and earn cash</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > _______________________________________________</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > CLucene-developers mailing list</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > CLu...@li...</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > > https://lists.sourceforge.net/lists/listinfo/clucene-developers</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > ></span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > -------------------------------------------------------------------------</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > Take Surveys. Earn Cash. Influence the Future of IT</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > Join SourceForge.net's Techsay panel and you'll get the chance to share your</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > opinions on IT & business topics through brief surveys - and earn cash</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > _______________________________________________</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > CLucene-developers mailing list</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > CLu...@li...</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > https://lists.sourceforge.net/lists/listinfo/clucene-developers</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> > </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> -------------------------------------------------------------------------</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Take Surveys. Earn Cash. Influence the Future of IT</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> Join SourceForge.net's Techsay panel and you'll get the chance to share your</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> opinions on IT & business topics through brief surveys - and earn cash</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> _______________________________________________</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> CLucene-developers mailing list</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> CLu...@li...</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> https://lists.sourceforge.net/lists/listinfo/clucene-developers</span></font></div> <div align="left"><font face="Arial" color="#7f0000"><span style="font-size:10pt">> </span></font></div> <div align="left"><br/> </div> <div align="left"><br/></div> <div align="left"></div> </body> </html> --Alt-Boundary-19464.1185019948-- |