In file src\core\CLucene\index\DocumentsWriterThreadState.cpp, function void DocumentsWriter::ThreadState::FieldData::invertField()
around line 892, the stream = analyzer->reusableTokenStream(fieldInfo->name, reader); call is supposed to create a stream "reusable", but most of the analyzers are just creating a new stream.
For now I am not sure how to implement the reusable Token stream correctly (maybe should read latest Lucene code), but in my local build I just delete the stream and the memory leak is gone.
I would like to help with this issue and borrow ideas from Lucene latest development.
The multi-thread support is a good improvement for my project. The bottleneck are actually string inverting and using more threads will reduce time cost.