It would be helpful if the QueryEnvironment class had a deleteDocument(const std::vector< lemur::api::DOCID_T > &documentIDs) method. I'm dealing with a number of sharded indexes. I can load them all into a QE and do queries to find the documentID(s) of docs I need to delete, but I don't think that helps me because I'm getting back the 'cooked' documentIDs which can't be used in an IndexEnvironment::deleteDocument(int documentID) call. Instead, I have to load each shard into a QE, find the documentID(s) for docs I need to delete, then init an IndexEnvironment with the shard and call IndexEnvironment::deleteDocument(int documentID). (If I'm missing something that would make this easier, please let me know.)
A really nice enhancement would be something like Lucene's IndexWriter.deleteDocuments(Query query) which would let me search and delete in one method call. Thanks.
The goal of the QueryEnvironment class is to provide read only access to the underlying repository.
You can accomplish this activity by modifying your code to use
QueryEnvironment::addIndex(IndexEnvironment), keeping the IndexEnvironments around between each query call. See http://lemur.sourceforge.net/indri/classindri_1_1api_1_1QueryEnvironment.html#a00e5012eafbbdff0eee582166d0e35a4
You can uncook the document ids as long as you know the order the IndexEnvironments were added to the QueryEnvironment (see the comment in QueryEnvironment.cpp):
and the see the documentLength API implementation for an example of uncooking:
Performing a similar bit of work will enable using the appropriate IndexEnvironment to delete each document.
When you are all done with your deletions, close each IndexEnvironment to commit the changes.