Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
WACS is a tool for building Adult Web Sites; equally suitable for managing a private collection or building a commercial site. It has many powerful features including dynamic filtering, model catalogs, automatic download and powerful search engine.
It will show of the files that you have uploaded though it or by ftp to a certain directory. THIS SCRIPT MAY NOT BE LISTED ON ANY OTHER WEBSITE EXCEPT FOR CREAMERSREALM, SOURCEFORGE, HOTSCRIPTS. IF IT IS IT MUST BE REMOVED OR LEGAL ACTION WILL BE TAKEN.
Written in PHP and designed to maintain a personal database of bookmarks, Linkerdoodle is a simple link organizer.
Ferret CMS is a Content Management System based on Zope. It is focused on easy administration and fast deployment of a web site. It has a workflow mechanism with roles that can be assigned to backend users.
Group file share with advanced text parsing capability for easy search
Originally created as a church resource sharing system, phpShare&Search allows users to create accounts, share documents, search documents, and like or report documents. phpShare&Search's power comes from its advanced document parser which extracts text from .PDF, .TXT, .DOC, and .DOCX files and its community features of liking resources and reporting them as inappropriate or SPAM. Users also subscribe to weekly updates of new content. User's may choose to download and host/install/configure/modify/manage this code themselves, or contract the code writer to do these functions for them. Contact me for a reasonable quote. eedrew <at> users <dot> sourceforge <dot> net To support future revisions and/or contribute based on the value you found from this code, checkout the External Link drop-down in the menu. Also, if you do not wish to create and maintain your own installation, email email@example.com for a quote on a turn key solution.
HTTP Directory Index consiste en un script PHP que actúa como interfaz gráfica amigable para indexar directorios Web.
This project aims to build a suite of Natural Language Processing tools. Modules will include corpus indexing and access tools, a part-of-speech tagger, tokenisers, text classification software, etc.
Values-based Document Analysis: I want to take some rudimentary Document Analysis work that I have done and make it more sophisticated and to use it to analyze (at least) all of the docuemnts of the web for (human) values priorities. The project woul
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
Utils for the use of webbase/mifluz
OpenSiteSearch is the new Open Source version of OCLC's original java-based web application for building Z39.50 portals (i.e. virtual union catalogues). This project is specifically aimed at the library community.
A C Implementation of an OAI-PMH Static Repository Gateway.
A C++ library for processing Internet Archive ARC, CDX, and DAT files.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
IGLU is a Java class library designed to facilitate sharing of code among Artificial Intelligence/Information Retrieval researchers to illustrate how various problems can be solved in Java. It is developed and maintained by the IGLU Research Group.
Provides efficient, effective implementations of 32- and 64-bit hash functions based on Rabin fingerprints / irreducible polynomials, in Java. Also provides integration with java.security.MessageDigest API.
This was a terrible idea and is equally terribly implemented.
The Netjuke is a Web-Based Audio Streaming Jukebox powered by PHP 4, a database and all the MP3, Ogg Vorbis and other format files that constitute your digital music collection. Supports images, language packs, multi-level security, random playlists, etc
Omseek has been renamed to Xapian. Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. It allows you to easily add advanced indexing and search facilities to your applications. See xapian.org
Pansophica is an intelligent web search agent that presents results in a dynamic and interactive virtual reality. Twist, fly and play the net.
A hypertext-browser written in Java which filters links (emails, docs or pics for e.g.) out of .html-documents and paints them on screen in hierarchical order. Users get a quick overview of how a website is put together.
RAHoo is a PHP-based, self-documenting, easy to install, fully customizaable web application written in PHP using MySQL and a suitable web server. Use it to keep a directory structure of links similar to Yahoo or the Google search directories but focuse
Lucy is a text search engine developed to rapidly index and search large amounts of data. It is capable of standalone searching or being embedded in another application.
Command line HTML Parser to be used in scripts to extract data from HTML/webpage according to supplied path and options. Usefull for systematic periodic parsing pages with known structures where information keeps changing - like looking for item on ebay