Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
The Netjuke is a Web-Based Audio Streaming Jukebox powered by PHP 4, a database and all the MP3, Ogg Vorbis and other format files that constitute your digital music collection. Supports images, language packs, multi-level security, random playlists, etc
HTTP Directory Index consiste en un script PHP que actúa como interfaz gráfica amigable para indexar directorios Web.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
Zope is an open source application server specializing in content management, intranets, and custom web applications. Zope is written in Python and has a large, global community of developers and companies.
Cheshire3 is a fast Z39.50, SRW, XML search engine, written in Python for extensability and using C libraries for speed. Next generation of the Cheshire system (http://cheshire.berkeley.edu) and designed around a distributable, object oriented model.
High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.
Omseek has been renamed to Xapian. Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. It allows you to easily add advanced indexing and search facilities to your applications. See xapian.org
Pansophica is an intelligent web search agent that presents results in a dynamic and interactive virtual reality. Twist, fly and play the net.
Values-based Document Analysis: I want to take some rudimentary Document Analysis work that I have done and make it more sophisticated and to use it to analyze (at least) all of the docuemnts of the web for (human) values priorities. The project woul
A C++ library for processing Internet Archive ARC, CDX, and DAT files.
Lucy is a text search engine developed to rapidly index and search large amounts of data. It is capable of standalone searching or being embedded in another application.