CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
PHPCrawl is a high configurable webcrawler/webspider-library written in PHP. It supports filters, limiters, cookie-handling, robots.txt-handling, multiprocessing and much more.
Digital Library Search Engine
SeerSuite is an application toolkit for digital libraries and search engines; i.e., CiteSeerX. CiteSeerX has moved to GitHub, please get the latest code from: https://github.com/SeerLabs/CiteSeerX
The stuff here has no documentation and some of it may never be completed. This is my playground, use at your own risk.
Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems (file systems, web sites, mail boxes, ...) and the file formats (documents, images, ...) occurring in these systems.
Forked from https://sf.net/p/fmd/
The Free Manga Downloader (FMD) is an open source application written in Object-Pascal for managing and downloading manga from various websites. This is a mirror of main repository on GitHub. For feedback/bug report visit https://github.com/riderkick/FMD
A php application that provides a web-based graphical interface similar to apache directory listing. Functions:copy, move, delete, rename files, etc. For more detail, please go to the official site.
Imgur Gallery Downloader
Users can now search Imgur for any phrase and ImgurDL/Loadur will automatically search for matching images. ImgurDL/Loadur will download the images while displaying the progress to the user.
Fusker is a tool to create entire image galleries from an single specially constructed URL.
===NOTICE=== After releasing a few updates, but far less than we wanted, we’ve made the decision to stop the OptimizeGoogle Project. The reasons for the decision were that there were not enough people on the team to keep it going. Google is changing things every day and it has become more and more frustrating to look at all the functions go broke piece by piece. The code will remain GPL, perhaps another person or team is interested in picking this up. For now, thank you for all your patience, feedback and support. Description: OptimizeGoogle is a Firefox extension that enhances Google search results and other pages by adding extra information and removing unwanted information. Created to maintain and improve CustomizeGoogle which seems to have been abandoned.
Seeks is a free and open technical design and application for enabling social websearch. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.
Bibliophile is a loose grouping of independent OS or GPL bibliographic systems and aims at promoting discussion, standards and the development of common utilities.
TEK empowers low-connectivity communities by providing a full Internet experience using email as the transport mechanism.
OpenEphyra is an open framework for question answering (QA). It retrieves answers to natural language questions from the Web and other sources. Visit http://www.ephyra.info/ for more details and information on joining this open research initiative.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
JobHunter aims to automatically find job information from some big sites such as chinahr,51job,zhaopin and so on. JobHunter searches the email address of each job item and automatically sends a email of application text to it.
A function-testing, performance-measuring, site-mirroring, web spider that is widely portable and capable of using scenarios to process a wide range of web transactions, including ssl and forms.
WACS is a tool for building Adult Web Sites; equally suitable for managing a private collection or building a commercial site. It has many powerful features including dynamic filtering, model catalogs, automatic download and powerful search engine.
Quran Search Engine API
Alfanous (The Lantern - الفانوس ) is an Arabic search engine API provide the simple and advanced search in the Holy Quran , more features and many interfaces...
A multi-threaded web spider that finds free porn thumbnail galleries by visiting a list of known TGPs (Thumbnail Gallery Posts). It optionally downloads the located pictures and movies. TGP list is included. Public domain perl script running on Linux.
MovieGrabber is a fully automated way of downloading
MovieGrabber has now moved to GitHub! https://github.com/binhex/moviegrabber
PRO-Search is a crawler of FTP servers, SMB shares, HTTP, dc++ networks, ... with powerful web search and navigation interface
The CMS-Bandits is a set of php scripts, with online html editor, calendar, search engine, rss reader, revision log, personal nickpage, comment system, webcrawler and even more.