Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
Utility program that can aid system administrators in searching for unblocked websites. The tables it generates can be exported to CSV files for further convenience.
Buzzsearch is a Perl and MySQL based SMB/FTP search engine that originated at Georgia Tech. It should run on any UNIX machine with Samba, however I have only tested it on Linux.
Cascade allows you to easily maintain a web-based Yahoo-like directory of resources using web-based forms.
Directory listing and download system for a PHP capable webserver. Much like the fancy auto index of apache but much more versatile. This is the script that powers: http://downloads.unrealadmin.org
CitemaPP is a Google Sitemap generator written in C++. Instead of crawling the html-doc directory on your server, CitemaPP crawls the content of your server via http protocol.
Contineo is a Web-based Document Management System (DMS). Features: Folder organization, document Versioning, Bulk import, import from mailbox. NOTE: this project has been DISMISSED in favor of LogicalDOC http://sourceforge.net/projects/logicaldoc
A distributed search portal of common sources of ISBN numbers, with permanent caching of results. To provide a open-source free interface for ISBN retrieval using HTML, SQL or XML to be independent of any toolkits or software.
Domain registrar database. Have details, reviews, ratings of all domain name registrars and more on your site. Could be helpful for web hosts, especially. If a client is having trouble deciding where to register their domain, point to the registrar db.
The Exteca platform is an ontology-based technology written in Java for high-quality knowledge management and document categorisation. It can be used in conjunction with search engines.
FTPSearch/Agent is a fully functional ftp indexing & searching engine for medium local networks (20-200 servers). Unique associative extend of searching allows you to garther much relevant results. FTPSearch/Agent is written in java and PHP and use MySQL
Fusker is a tool to create entire image galleries from an single specially constructed URL.
A command-line utility to search Google. Note that Google has disabled the API this depended on, so it no longer functions.
HAE is a php-based file system explorer. It provides a user-friendly interface to browse the content of a HTTP server, close to desktop environments.
IRC Web Search is an IRC search engine for IRC logfiles. This project is now superseded by the IRC Collective, please visit http://www.irc-collective.org/ for more information.
The JSearch Project wants to provide the internet with a Java based generic interface for search engines. It consists of a core interface, search engine adaptors, a sort/merge module and a JSP based GUI.
A web-based, automated search engine that responds to user feedback through the use of artificial intelligence.
A fully automatic and transparent tool to measure programming language popularity on the Internet.
Mu (My Uploader) is a simple PHP CMS allowing users to upload files to the server hosting Mu.
Docindex is an open, extensible system that permits web-based catalog searches and access-controlled fetch from a group of document repositories on multiple CVS (or other) servers. When originally written, it filled a temporary need. Now, it seems to hav
This Knowledge Base is heavily influences by the FOM, but is PHP+mySQL based. It is currently under production, but an example of progress (a few steps behind) is available at http://kb.zeroasterisk.com | if you want to play or help, contact me...
Pagecast makes it easy to submit lists of URLs to Internet search engines. Advanced features include: ability to check the URL's for certain problematic conditions. Designed to be simple to use, but effective.
Personal CD Database, PHP based software allows you to gather information about your CDs & DVDs collection. Add, update, browse and find your disks as fast as never before!
PyEsp - Enhanced/Evolving/Extensible Semantic Profiling. This Python program will sort and filter search results by applying semantic profiling on web pages. The program will learn the user preferences and profiling will be done on the client computer.
ROADS was a set of Perl tools for helping people to catalogue Internet resources, e.g. websites.