Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
Digital Library Software
Greenstone is a complete digital library creation, management and distribution package created and distributed by the New Zealand Digital Library Project. There are two major versions of the software. Greenstone 3 is under active development, and is recommended for download. We also provide maintenance releases for its forerunner, Greenstone 2. Featured download not what you're looking for? Click "Browse all files" to access binaries and source releases of both versions.
An open source search engine with RESTFul API and crawlers
OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows and Linux/Unix/BSD.
Transforms Netscape bookmarks into a Yahoo-like website with slashdot-like news.
ht://Check is more than a link checker. It's particularly suitable for checking broken links, anchors and web accessibility barriers, but retrieved data can also be used for Web structure mining. Uses a MySQL backend. Derived from ht://Dig.
ARADO RSS Feed Reader is a URL Database for Websearch and RSS Feed Reading, which saves your added Bookmarks & RSS-Feeds and syncs newest URLs with your connected devices. Store and Search your all your URLs in ARADO. As framework c++ / Qt is used.
SWISH++ is a Unix-based file indexing and searching engine (typically used to index and search files on web sites). It's very fast, robust, and can index several file formats including text, HTML, mail, news, LaTeX, and MP3, and apply filters.
Grub is a distributed internet crawler/indexer designed to run on multi-platform systems, interfacing with a central server/database.
Seeks is a free and open technical design and application for enabling social websearch. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.
Lurker is a mailing list archiver designed for capacity, speed, simplicity, and configurability in that order. Noteworthy features include: google-style searching on all fields, chronology preserving threads, multilingual, and attachment support.
Xyzse has implemented the essential functions of general web search engines. It is developed for students or anyone who are interested in search engine. More features will be added in the following releases.
Amberfish is general purpose text retrieval software. It supports nested queries of semi-structured text in XML format and traditional unstructured searching.
Swishd cluster system is an application that will allow swish-e to scale out to multiple machines.
ASPseek is a full-featured medium-to-large scale SQL-based Internet search engine. It consists of an indexing robot, search daemon and search frontend (CGI program). These programs are written in C++ using the STL library.
A new Web Crawler including sophisticated searching process especialized by language !
HooDoo is designed to provide most of the same functionality of Google, but available to all for their websites
A high-speed FTP search engine which use RevertIndex to search.
A full featured Internet search engine, specifically designed to power vertical search, enterprise search, or a knowledge area search. Can index 2.5 million documents per 24 hours on a single Dell server. Clean C++/STL code written from scratch.
Caused by new releases and/or activities of similiar tools like swish++ and swish-e this project has been closed.
The Alpine Network is a peer based application and network infrastructure designed for distributed resouce location, including file/data transfer. Alpine attempts to resolve the distributed search/sharing problem using an efficient messaging system.
An automated client developed for downloading sequenced files.
The target of this project is to develop a protocol and a server building on top of TCP/IP. I want to manage bookmarks over the network. The protocol will be based on XBel, an XML bookmark exchange language.
arachne is a C++ library for HTTP crawling, link, text and metadata extraction designed to run in a distributed environment.
CitemaPP is a Google Sitemap generator written in C++. Instead of crawling the html-doc directory on your server, CitemaPP crawls the content of your server via http protocol.