Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.
Index biological data (genbank sheets, Uniprot...) in a Solr indexer, with index shard support and provides a query interface. Project goal is to create a virtual image with indexer and web interface to query and visualize biological data.
This project provides cross-forge semantic search for the Qualipso Forge. It integrates A4 AdvDoc prototype (semantic search GUI and engine) with A3 homogeneous and heterogeneous cross-forge semantic search capabilities. See Qualipso.org for details
TeamFound gives a team the capability to share search results without any usage-overhead. The toolbar (firefox and ie) can be used to mark interesting pages and full-text-search those while also showing normal search-engine results for the same keywords.
Roosster.org is a personal "on-demand" search engine. This means, it indexes only items/entries/files/URLs you explicitly tell it to index and provides a full-text-search over indexed items.
DoxMentor4J is a standalone cross platform Web/Ajax based documentation library that is fully searchable and may be hosted in the file system, in an archive or embedded in the Java classpath.
TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
Hier geht es um einen Webbrowser zum Kooperativen Surfen im Web. Dazu kann man sich an einem Server anmelden und Gruppen bilden zum Surfen. Alle Mitglieder einer solchen Gruppe sehen dann, wo die anderen gerade unterwegs sind und können sich gegenseitig ü
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
This is the official collaborative development environment of the Large Knowledge Collider (LarKC), a platform for massive distributed reasoning that aims to remove the scalability barriers of currently existing reasoning systems for the Semantic Web
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
A system to perform analysis of large documents for the purpose of cataloging similar documents. Similarity is based upon contextual analysis of these documents done by identifying common words and proper nouns.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
High performance distributed in-memory key/value store
Infinispan is an opensource, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
Java GUI that connects to content providers API such as Google, Bing, Wikipedia and implements a local search engine powered by Lucene, to search different contents: images, videos, articles, files and display them in an ergonomic OpenGL component.
True-Hybrid Web Search Engine, which is designed to organize a web-based information by making heavy use of a mutually beneficial collaboration between Human and Artificial Intelligence.
XML documents To Generated dynamic web application supporting CRUD actions. Credits to Ministry of Culture and Communication, France; UNESCO; Ecole Nationale des Chartes, France; PASS-TECH, France.
A universal platform for resource discovery and description that shares XML meta-data over existing peer-to-peer (P2P) networks such as Gnutella and JXTA.
InfraRed is a Information Retrieval system. Its purpose is to allow you to find the information you need from a collection of documents, ignoring the unnecessary details, exactly as if you were taking an infrared picture.
DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.
Common Library is a learning content and document management system that enables dynamic outcome-based learning object alignment through semantic analysis & granular addressability. For implementation questions contact http://www.bluefletch.com.