With Apache Tika (http://lucene.apache.org/tika/ ) there is now an official Lucene project that tries to solve what Docco does with its indexing tools. To remove redundant work it would make sense to replace all of Docco's indexing with the Tika library.