#216 IndexWriter::optimize() may cause memory leak


1. Deleting some documents from the exist index, then close the reader and searcher.
2. Add some new documents into this index use IndexWriter.
3. Call IndexWriter::optimize() and IndexWriter::close().
4. Delete the analyzer(StopAnalyzer).
5. Delete the IndexWriter.

Now, Visual Leak Detector reports two memory leak for StopAnalyzer::reusableTokenStream:
streams = _CLNEW SavedStreams(); --->Deleted.
streams->source = _CLNEW LowerCaseTokenizer(reader); --->Not deleted
streams->result = _CLNEW StopFilter(streams->source, true, stopTable); --->Not deleted.

I traced the codes, and find IndexWriter::merge() will delete the FieldsReader, FieldsReader has a member variable(fieldsStreamTL), which will delete all TLS data in _ThreadLocal when FieldsReader::~FieldsReader() get called, so the StopAnalyzer::SavedStream is deleted successfully, this is OK.
But when call 'delete analyzer' in test program, it get NULL when call getPreviousTokenStream() in StopAnalyzer::~StopAnalyzer(), so the two members of SavedStream are never get freed.

My solution is that I moved "delete analyzer" before IndexWriter::optimize(), it works well.

You can also get the problem if you index a large number of documents(like 450,000+, let IndexWriter call merge()), also you will get this memory leak if the IndexWriter::merge() is called, but index a small number of documents will not.(because IndexWriter::merge() is not called)

I think the same problem existed in StandardAnalyzer and other analyzers use "SavedStream" to wrapper something.

I am working on Windows7 64bits, Visual Studio 2008, latest version of Virtual Leak Detector.



Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks