Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
The Netjuke is a Web-Based Audio Streaming Jukebox powered by PHP 4, a database and all the MP3, Ogg Vorbis and other format files that constitute your digital music collection. Supports images, language packs, multi-level security, random playlists, etc
Provides efficient, effective implementations of 32- and 64-bit hash functions based on Rabin fingerprints / irreducible polynomials, in Java. Also provides integration with java.security.MessageDigest API.
The Alpine Network is a peer based application and network infrastructure designed for distributed resouce location, including file/data transfer. Alpine attempts to resolve the distributed search/sharing problem using an efficient messaging system.
Cheshire3 is a fast Z39.50, SRW, XML search engine, written in Python for extensability and using C libraries for speed. Next generation of the Cheshire system (http://cheshire.berkeley.edu) and designed around a distributable, object oriented model.
Cicerone is a multi-platform, multi-server, multi-database, web-based corporate information system like no other. Completely web-driven and accessible through any 4.x web browser, Cicerone allows your company to create and maintain information on the fly
The DesignCMS system is designed specifically for graphic designers who do not have the time or inclination to learn server side scripting such as ASP, and who need to provide professional content management to completely non-technical end users
HTTP Directory Index consiste en un script PHP que actúa como interfaz gráfica amigable para indexar directorios Web.
High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
This was a terrible idea and is equally terribly implemented.
Omseek has been renamed to Xapian. Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. It allows you to easily add advanced indexing and search facilities to your applications. See xapian.org
Pansophica is an intelligent web search agent that presents results in a dynamic and interactive virtual reality. Twist, fly and play the net.
RAHoo is a PHP-based, self-documenting, easy to install, fully customizaable web application written in PHP using MySQL and a suitable web server. Use it to keep a directory structure of links similar to Yahoo or the Google search directories but focuse
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
The Somewhat Intelligent Proxy [SIP] is an effort at an open-source, natural language, web accessible instrument which utilizes Internet sources to return answers to your questions.
Values-based Document Analysis: I want to take some rudimentary Document Analysis work that I have done and make it more sophisticated and to use it to analyze (at least) all of the docuemnts of the web for (human) values priorities. The project woul
A hypertext-browser written in Java which filters links (emails, docs or pics for e.g.) out of .html-documents and paints them on screen in hierarchical order. Users get a quick overview of how a website is put together.
Utils for the use of webbase/mifluz
Zope is an open source application server specializing in content management, intranets, and custom web applications. Zope is written in Python and has a large, global community of developers and companies.
A C++ library for processing Internet Archive ARC, CDX, and DAT files.
Lucy is a text search engine developed to rapidly index and search large amounts of data. It is capable of standalone searching or being embedded in another application.
phpApacheBrowser is a File System Management Interface (written in PHP5) that was designed specifically for browsing files and directories on your Apache Web Server. You can also add/remove folders and files through this easy-to-use web application.
Group file share with advanced text parsing capability for easy search
Originally created as a church resource sharing system, phpShare&Search allows users to create accounts, share documents, search documents, and like or report documents. phpShare&Search's power comes from its advanced document parser which extracts text from .PDF, .TXT, .DOC, and .DOCX files and its community features of liking resources and reporting them as inappropriate or SPAM. Users also subscribe to weekly updates of new content. User's may choose to download and host/install/configure/modify/manage this code themselves, or contract the code writer to do these functions for them. Contact me for a reasonable quote. eedrew <at> users <dot> sourceforge <dot> net To support future revisions and/or contribute based on the value you found from this code, checkout the External Link drop-down in the menu. Also, if you do not wish to create and maintain your own installation, email email@example.com for a quote on a turn key solution.