Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
Port of the Google sitemap generator, from Python to Csharp aka C-Sharp aka C# aka .NET aka dotNet.
CyberCrawl Meta Edition is a fully modular MetaSearch-Solution. The Script (written in PHP3) queries multiple SearchEngines (expandble through easy to handle includes) and delivers the top 10 results in arrays, seperated by Link & Description.
This project will implement DAV Searching & Locating (DASL), an application of HTTP/1.1 forming a lightweight search protocol to transport queries and result sets and allows clients to make use of server-side search facilities.
Perl module uses MySQL database backend to index files, web documents and database fields. Supports must include, can include, and cannot include words and phrases. Support for boolean (AND/OR) queries, stop words and stemming.
DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.
=DOES NOT WORK ANYMORE AS DSA HAS PUT CAPTCHA= DSA Practical Driving Test Monitor helps you find any available practical driving test slot within specified date range. Runs on Linux/Mac/Windows and automates your manual task of finding the test slot.
DVDWeb is a Web Service which provides organization/search/lookup services through JAX-RPC API. The search can be done against the builtin DB (the user\'s private list of DVDs according to UPC codes) or against other Internet sites such as imdb or yahoo.
A content management system which allows web developers to create and organize a collection of URLs (a.k.a. - a link farm) using a searchable labeling system.
Yerlie is a graphic tool for analize html code for web pages, apply filter, get data and output to xml files.
A clone of demonoid.com
A community driven web app project to document and present tour history's of bands. Tour history will include information about venue, date, songs performed, and community members that attended the show.
A distributed highly customizable web search system designed to be able to include custem parsers to add additional searchable metadata from the content of a site as well as from the url of both the site and the referrer.
dCrawler (Distributed Crawler) alias D-HarvestMan (Distributed HarvestMan) is a distributed Web crawler implemented in the Python programming language. dCrawler is developed on top of the existing open source Web crawler named HarvestMan.
A distributed search portal of common sources of ISBN numbers, with permanent caching of results. To provide a open-source free interface for ISBN retrieval using HTML, SQL or XML to be independent of any toolkits or software.
Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.
This project is an attempt to create a database suitable for storing book reviews, and links to book reviews found on the internet and elsewhere. The intent is to re-use Apache components wherever possible - including Xindice XML database, Cocoon2 XM
The DocConversion project provides a distributed document conversion solution with a well defined API which makes use of existing convstion tools and/or a centralized conversion server. This is part of the PRONIR research at http://www.pronir.nl
Domain Name Portfolio is a FREE PHP and MySQL based application to help domain owners better organize their portfolio.
Domain registrar database. Have details, reviews, ratings of all domain name registrars and more on your site. Could be helpful for web hosts, especially. If a client is having trouble deciding where to register their domain, point to the registrar db.
DominoDig is a perl program designed to help facilitate auditing Lotus Domino web servers. Produces an HTML report that provides a list of all the unique .nsf databases it was able to access, as well as IP addresses and email addresses.
Sistema de recuperação de informação arquivística. Inicialmente desenvolvido pelo Arquivo Público da Bahia, vinculada à Fundação Pedro Calmon - Centro de Memória e Arquivo Público da Bahia.
The project is not maintained more. If you want to get a copy of OpenStats, please contact me.