The Alpine Network is a peer based application and network infrastructure designed for distributed resouce location, including file/data transfer. Alpine attempts to resolve the distributed search/sharing problem using an efficient messaging system.
Cheshire3 is a fast Z39.50, SRW, XML search engine, written in Python for extensability and using C libraries for speed. Next generation of the Cheshire system (http://cheshire.berkeley.edu) and designed around a distributable, object oriented model.
Cicerone is a multi-platform, multi-server, multi-database, web-based corporate information system like no other. Completely web-driven and accessible through any 4.x web browser, Cicerone allows your company to create and maintain information on the fly
The DesignCMS system is designed specifically for graphic designers who do not have the time or inclination to learn server side scripting such as ASP, and who need to provide professional content management to completely non-technical end users
Ferret CMS is a Content Management System based on Zope. It is focused on easy administration and fast deployment of a web site. It has a workflow mechanism with roles that can be assigned to backend users.
HTTP Directory Index consiste en un script PHP que actúa como interfaz gráfica amigable para indexar directorios Web.
Command line HTML Parser to be used in scripts to extract data from HTML/webpage according to supplied path and options. Usefull for systematic periodic parsing pages with known structures where information keeps changing - like looking for item on ebay
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.
High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.
Written in PHP and designed to maintain a personal database of bookmarks, Linkerdoodle is a simple link organizer.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
This was a terrible idea and is equally terribly implemented.
The Netjuke is a Web-Based Audio Streaming Jukebox powered by PHP 4, a database and all the MP3, Ogg Vorbis and other format files that constitute your digital music collection. Supports images, language packs, multi-level security, random playlists, etc
A C Implementation of an OAI-PMH Static Repository Gateway.
Distributed search engine for the Internet and intranets. Parallel search in heterogeneous indexes, topic-oriented harvester, CORBA interface to legacy document databases. Document clustering with neural networks.
Omseek has been renamed to Xapian. Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. It allows you to easily add advanced indexing and search facilities to your applications. See xapian.org
OpenSiteSearch is the new Open Source version of OCLC's original java-based web application for building Z39.50 portals (i.e. virtual union catalogues). This project is specifically aimed at the library community.
Pansophica is an intelligent web search agent that presents results in a dynamic and interactive virtual reality. Twist, fly and play the net.
RAHoo is a PHP-based, self-documenting, easy to install, fully customizaable web application written in PHP using MySQL and a suitable web server. Use it to keep a directory structure of links similar to Yahoo or the Google search directories but focuse
Provides efficient, effective implementations of 32- and 64-bit hash functions based on Rabin fingerprints / irreducible polynomials, in Java. Also provides integration with java.security.MessageDigest API.
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
The Somewhat Intelligent Proxy [SIP] is an effort at an open-source, natural language, web accessible instrument which utilizes Internet sources to return answers to your questions.
Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
Values-based Document Analysis: I want to take some rudimentary Document Analysis work that I have done and make it more sophisticated and to use it to analyze (at least) all of the docuemnts of the web for (human) values priorities. The project woul