FlixFinder: Tivo & Netflix marriage. Automatically find and schedule upcoming movies in cable/satellite listings based on your netflix queue. Now Greasemonkey script. (Original project deprecated since the tv listings are no longer available).
DLESE (Digital Library for Earth System Education) is a community-supported digital library dedicated to the collection, enhancement, and distribution of materials that facilitate learning about the Earth. Sponsored by the US National Science Foundation.
Command line application written in Java useful for automation of downloading process and filtering contents of downloaded files. jDownloader uses simple script file to configure downloading and filtering processes.
vbullmin is a data miner bot for vBulletin boards. vbullmin can get all Forums, Topics, Post and Users from a vBulletin. It can be export this values with phpbb2 database schema. It's a sample for Machine Learning. It's using patterns for getting data.
FathomFive is a classification aware lucene powered spidering and indexing solution, written in pure Java. It supports a variety of content types, provides an easy to use admin interface, and a customisable search interface. It spiders from HTTP and OAI.
InfoCrawler allows you to crawl and index various types of documents, accessing data from various resources: Intranets, public WEB sites, local or remote file systems. For product information please see our website at http://www.infocrawler.org/
OpenOffice Search is a document indexer and search engine for OpenOffice documents. It is Java-based, so it will run on any J2SE enabled platform and uses an embedded Derby (Cloudscape) database.
Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions. It strives to provide a comprehensive resource of Protein-Protein interactions for all major species.
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.
Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
The search aggregator allows users to initiate searches across multiple applications and receive aggregated results. This project is based on Lucene, written in Java, exposes web and plugin interfaces, and supports the Open Search and Json standards.
JxtASK is a P2P system that is aimed to search, download and share academic content hosted on websites that will join the JxtASK community. Joining is simple: siteadmins must generate(even automatically)a XML catalog which describes the files.
Nutch is an open source search engine.WebSphere Information Integrator Content Edition(IICE) is an IBM product that used to integrate enterprise content management systems.Nutch-IICE is a plugin for Nutch and an enterprise content search solution.
DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.
OpenMKS is a search & navigational tool for large multimedia collections. With pluggable functionality and a core subsystem supporting the z39.50 ZING Community SRW search & retrieval specification, it can be run either as a Servlet or as a Web Service.
Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
This forum software is a Java based discussion forum, that uses JDBC to store data in a database. This discussion forum is available in different languages and has features for easy integration into a site and easy administration of forum.
The Informa library provides a convenient Java API for handling news channels and metadata about them. Different syntax formats (RSS 0.91, 1.0, 2.0 and Atom 0.3, 1.0) for feeds are supported. Also support for channel information descriptions (OPML) avail
Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
OJAX provides - a meta-search service with a highly dynamic AJAX based user interface. - an OAI-PMH harvester to harvest multiple repositories to a single Lucene index - an easy to use, highly discoverable user interface to searching that index.
jGetFile is a command-line scriptable recursive file downloader for the web. Where other downloaders fail, jGetFile succeeds in downloading the files you want with simplicity and ease of use.
list2db reads digested email files generated by the mailman mailing list software and converts them into SQL for a relational database. The project also includes a PHP frontend for users to search and browse archived list emails.
Cross-platform searchable CD-ROM. Vicaya is a search engine and indexing tool for use on a local file system or CDROM, written in Java and based on Apache Nutch, and Tomcat. The goal is to replicate a website on a CD-ROM to be used on any platform.