Corbie is a full text information retrieval system written in Java and licensed under the LGPL.
The Cornell Web Lab Collaboration Server is a suite of tools and services for GUI-based extraction, analysis and sharing of archived web data. See http://weblab.infosci.cornell.edu/ and http://www.cs.cornell.edu/~weigel for details about the project.
Craigslist search/alerts utility allowing for continuous non-interactive searching of craigslist postings. It will continuously monitor craigslist for rarely posted items you look for, and email you the results of the search as they appear.
Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page. More open source at https://github.com/fcc.
Crawler.NET is a component-based distributed framework for web traversal intended for the .NET platform. It comprises of loosely coupled units each realizing a specific web crawler task. The main design goals are efficiency and flexibility.
System to assist with the management of a local Buddhist study library. Will download and install chosen documentation (sutras, etc), index them, and provide a web interface. Python, HTML. Needs a webserver, namazu. Can be used for other documents, too.
This search engine to index and search books and documents. Using this book search you will be able to search from contents of each book. Currently supported formats are PDF, HTML and TXT. Autosuggest feature is also available
A clone of demonoid.com
The DesignCMS system is designed specifically for graphic designers who do not have the time or inclination to learn server side scripting such as ASP, and who need to provide professional content management to completely non-technical end users
A java (SWT) application that creates a searchable index of Del.icio.us bookmarks. The application additionally indexes the contents of the pages linked to by the Del.icio.us bookmark for more robust searching.
Simple File Indexer for your website
This is a simple PHP script designed to index all your files in the root folder of your website. Of course you can ignore certain files, like the index and other pages from the website. It isn't very complicated to edit, config and mod this script.
DocInfoRetriever is a Web_based document full-text search engine based on lucene. It allows you to search the contents and metadata of documents . Supported document formats, likes doc, xls, pdf, odt, jpg...etc.,and torrent files.
Ce projet est une modification du composant DocMan 1.4.0 pour Joomla 1.5.9, qui permet d'ajouter une recherche plein texte sur les fichiers de type: PPT, PDF, DOC, TXT, HTML, PS.
DominoDig is a perl program designed to help facilitate auditing Lotus Domino web servers. Produces an HTML report that provides a list of all the unique .nsf databases it was able to access, as well as IP addresses and email addresses.
A system to retrieve and display in 3D the structure of the Internet (or as much as can be analysed). It should allow for an interesting perspective of the way pages are linked and clustered. It will hopefully also provide a more intuitive way of browsing
A content management system for collecting, editing and searching profiles. We support student job-seeker profiles out-of-the-box, and ANY type of structured data (recipes, bug tracking, apartments, job postings, for sale, etc) with customization.
EpiSPIDER is a public health healthcare application that integrates of information from different distributed electronic resources and performs semantic processing of text visually represent information using graphic display, GIS and mapping technologies
European Anti-Corruption Centre Tools & Apps for Developers +End Users
The European Anti-Corruption Centre™ EurACC.eu is an Open Source based, Sustainable Public Service format, globally searchable, language crossing, w/ standard browsers, https:// SECURE ONLINE PORTAL, where unedited & un-commented, full-text, original published source materials, in multiple languages, from numerous countries, provide light on the problems, issues, case studies, ongoing investigations + published reports of truly ALL Types of Private, Commercial, Political & Election related, as well as Public Sector Corruption, Fraud, Bribery, Kickbacks, Malfeasance, Misfeasance, Non-feasance, White-Collar Crime, Commercial, Industrial, and Contracting for the Public Sector related Bidding Fraud, Tax Fraud, Money Laundering, Tunneling, Asset Stripping, Phoenixing, Insider Trading, Ethics Violations, & other serious frauds & criminality connected to organized crime, corrupt firms & individuals - whether based in, directly or indirectly operating in or through Europe, in any manner.
A PHP photo album storing all the non-graphic data in MySQL. You can write and display comments on pictures in multi languages. Exporia's directory structure is streight-forward and it works with PHP's safe mode.
FTPSearch is a java-based program that garthers URLs from many ftps and stores them in database to provide search function. Currently Postgres and MySQL are available for data storage.
The Fake File project is aimed at providing an accurate database of malicious files that may damage &/or disrupt publicly accessible computer systems / networks.
Fast SMB Search is a search engine for local SMB-based networks (e.g Windows networks). It's key feature is the ability to quickly search for a file in a large network. Also supports FTP search, so project name is not strict
A fusion of several open-source libraries and a web application to parse and filter RSS feeds, as well as generate RSS feeds based on user defined search terms
Short Python script designed to wget lists of websites and concatenate them into "summaries" for offline viewing.