webspider provides a mechanism to get contents from web. With the extended classes, you can do the following things: 1. grab urls from a specified base url 2. analyze the contents of a list of urls 3. get specific files from web 4. blablabla
DLC - HTTP link checker written in Perl. Can generate HTML output for easy checking of results and process a link cache file to hasten multiple requests. Initially created as an extension to Public Bookmark Generator (PBM); can be used alone.
The PHPBible project writes and maintains PHPScripture, which is a simple, but powerful PHP script that allows easy navigation through the text of the Bible. It has a simple interface, and is easy to install on any web site.
Websitemirror is a small program to download complete websites into a specific directory for offline viewing. Websitemirror ist ein kleines Programm welches eine komplette Webseite in einer Verzeichnis für offline browsen herunterläd.
Bibliophile is a loose grouping of independent OS or GPL bibliographic systems and aims at promoting discussion, standards and the development of common utilities.
Seeks is a free and open technical design and application for enabling social websearch. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.
Unlock Google's potential. Use this application to find infomation that is more relevant to your search... This application allows enhanced searching on Google without the need for long modifiers etc..
Project moved to GitHub! https://github.com/carrot2/carrot2 Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 integrates very well with both Open Source and proprietary search engines.
Document summarization system. By adding document content to system, user queries will generate a summary document containing the available information to the system.
Script for automated downloading of PDFs from Guardian's subscription service (guardian.newspaperdirect.com, used to be digital.guardian.co.uk)
IDRA (InDexing and Retrieving Automatically) is a tool which allows indexing a wide range of text (TXT, DOC, PDF) and image annotations files (XML), query-based searching, visualizing an index, saving it for re-usability, evaluation, etc.
MeGaSearch is a Customizable Desktop Application to search files on file-uploading-websites like Rapidshare, MegaUpload etc You can also add your desired websites & filetypes from the Wizards menu to optimize & customize your search.
This is a free RSS client that has several nice features. You can create and delete groups to sort your feeds, and add favorite links. Downloads and parses dozens of feeds in seconds using threading. All links open in the built in browser which supports
Google() meets the Matrix. Red Piranha combines Lucene (Searching Ability), XML-RDF (ability to learn), Tomcat (for P2P Power) and Spring (Ease of use) to not only let you find anything, anywhere, but to actually understand what you are looking for.
StrangeSearch is a LAN (NetBios/Samba/Windows Sharing/FTP) search engine. It indexes available files on a network and has a web-based CGI search interface for users to search for files.
WebCollector is an open source web crawler framework based on Java.
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
WebExtractor360 is a free and open source web data extractor. It allows you to extract Images, Phrases, URLs (Links), URLs (Keywords), Emails, Phone, Fax and ANY other information on the web by specifying a Regular Expression. See http://www.webextractor
Command line application written in Java useful for automation of downloading process and filtering contents of downloaded files. jDownloader uses simple script file to configure downloading and filtering processes.
phpMonAnnuaire is a directory written in PHP. It can use MySQL or LDAP db. You can adapt it to your own database or use the MySQL table that come with.
WACS is a tool for building Adult Web Sites; equally suitable for managing a private collection or building a commercial site. It has many powerful features including dynamic filtering, model catalogs, automatic download and powerful search engine.
A Free site search engine script build with PHP and Ajax.
ALTSE is an alternative search engine technology. It can index up to a couple million Web pages.
Search for torrents around the world with this easy to use java search application. Easy to configure which websites to search.
This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
Search Google with precision.
This little tool will help you with your Google searches. It uses Google Operators to search with precision the terms as you enter them in the program. It work with almost everything Google provide (Search Operators). You can search for files, sites/domains, search in URL, in title, in text… In order to provide the best results, some fields disable others, some Google Operators aren’t meant to be mixed up with others! It will launch the search results in your default browser, so you can keep working with the same browser.