Digital Library Software
Greenstone is a complete digital library creation, management and distribution package created and distributed by the New Zealand Digital Library Project. There are two major versions of the software. Greenstone 3 is under active development, and is recommended for download. We also provide maintenance releases for its forerunner, Greenstone 2. Featured download not what you're looking for? Click "Browse all files" to access binaries and source releases of both versions.
Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
Imgur Gallery Downloader
Users can now search Imgur for any phrase and ImgurDL/Loadur will automatically search for matching images. ImgurDL/Loadur will download the images while displaying the progress to the user.
IRToolkit is an attempt to build and develop a generic search engine that integrates state-of-the-art Information Retrieval (IR) models. Furthermore, it offers a capability to compare the performance (in terms of precision, recall, index size, search response time and so on) between several open source IR applications. If you use the IRToolkit please cite the following work: https://sites.google.com/site/dinhbaduy/bibtex#Dinh-Phdthesis-2012
Hyper Estraier is a full-text search system. It works as with Google, but based on peer-to-peer architecture. Using Hyper Estraier, we can construct a large-scaled search engine with cheap computers.
Google() meets the Matrix. Red Piranha combines Lucene (Searching Ability), XML-RDF (ability to learn), Tomcat (for P2P Power) and Spring (Ease of use) to not only let you find anything, anywhere, but to actually understand what you are looking for.
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
The Medlane project is an attempt to create a set of tools that will enable librarians to move from the standard MARC (MAchine Readable Cataloging) format to a new library/museum XML format. This move will ensure traditional library/museum data remains
Online news and newspaper harvester - Like RSS Newsreader w/ database. National & International News. Very detailed catches hard to find news articles. Allows resposting of summaries w/ comments to Usenet Newsgroups, complex searches & more.
Simple application for downloading pictures from Zerochan.net
Simple java application for downloading high-quality pictures from Zerochan.net. You can find images by size or a tag. It's simple. And flat. All you need to do: download .jar file and run it with Oracle JVM (or any another JVM supporting image decoding)
webExtractor is a Java application that is used for extracting specific content from web based HTML, XML, CSV, and free form text. The extracted data can be used for data gathering and mining purposes.
Analyze and visualization of the social structuring from "lastfm.de" which contained user data, friendslist, groups, group-members and musical neighbours.
Caissfind is expected to be an independent web searching application based on Google API in Java.
=DOES NOT WORK ANYMORE AS DSA HAS PUT CAPTCHA= DSA Practical Driving Test Monitor helps you find any available practical driving test slot within specified date range. Runs on Linux/Mac/Windows and automates your manual task of finding the test slot.
Fire.now is a Firefox plugin that automatically adds your documents to the WhereIsNow latest version discovery service. Everytime you upload a document somewhere, Fire.now integrates the WhereIsNow keys into the file and add it's url to WhereIsNow.
FlixFinder: Tivo & Netflix marriage. Automatically find and schedule upcoming movies in cable/satellite listings based on your netflix queue. Now Greasemonkey script. (Original project deprecated since the tv listings are no longer available).
The LEADERS toolkit is a generic toolset that enables the creation of an online environment which integrates EAD finding aids and EAC authority records with TEI transcripts and digitised images of archival material suitable to a wide variety of archives.
Mac GoogleSeach is an OpenSource effort to implement the Google SOAP APIs on Mac OS X.
OMax is set of projects including real estate crawler and management system.
Omseek has been renamed to Xapian. Xapian is a Search Engine Library, written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C# and Ruby. It allows you to easily add advanced indexing and search facilities to your applications. See xapian.org
An RDF-based post content information system with associated APIs, used to provide intelligent information about the status and accessibility of a web document. Functionality: page re-directs, intelligent 404 handling, Threadneedle connectivity and other
A Distributed Web Search System bassed on current high level nerual network ai for the rating algorithum and search direction.
Dr. Micheal Kay: "Saxon 8.7 is the first release to be released simultaneously by Saxonica on the Java and .NET platforms." MDP: Mission accomplished! Saxon for the .NET platform from Saxonica is now available and supported via the http://saxon.sf.net
Sciense Searcher is a system that lets you search, organize and share bibliographic cites of research articles, books, booklets, collections, manuals, thesis, proceedings, technical reports, unpublished publications and misc.
Values-based Document Analysis: I want to take some rudimentary Document Analysis work that I have done and make it more sophisticated and to use it to analyze (at least) all of the docuemnts of the web for (human) values priorities. The project woul