InfoCrawler allows you to crawl and index various types of documents, accessing data from various resources: Intranets, public WEB sites, local or remote file systems. For product information please see our website at http://www.infocrawler.org/
IndexerX is an OpenSource Printer Spooler Text Indexer aka an OpenSource Print Channel Data Loss Protection Solution (DLP).
IndexerX is a configurable Java daemon,based on Design Patterns models,used to capture printed documents from Linux spooler,and index the text content in a Data Base,for further tokens search with its integrated search engine.
Developed @ ESIB-USJ.
The DataTable for java,like DataTable of ado.net. can use index to accessing the data, example:myTable.getRows().get(1).getString(4); have metaMangager and fill data from jdbc. can manage blob,clob and all jdbc type; can to xml and from xml;
Toke is a webmining toolkit for web exploring, indexing and searching for Java. Toke allows to you crawl public or private web sites, in order to create web estatistics, web Pajek graphs, Lucene indexs and word frequency files for data clustering.
I AM File Indexing can index files in given folders and make the content search able. Written in pure java it is meant for people who need very basic web-site search or multiple files search capability for their java applications.