Hunspell is a spell checker and morphological analyzer library and program designed for languages with rich morphology and complex compounding or character encoding. Hunspell interfaces: Curses, Ispell compatible pipe interface, OpenOffice.org UNO module
CLucene is a C++ port of Lucene: the high-performance, full-featured text search engine written in Java. CLucene is faster than lucene as it is written in C++.
The ht://Dig system is a complete indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista.
High performance distributed in-memory key/value store
Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
Online news and newspaper harvester - Like RSS Newsreader w/ database. National & International News. Very detailed catches hard to find news articles. Allows resposting of summaries w/ comments to Usenet Newsgroups, complex searches & more.
Data Fountains is an automated collection building system of benefit to Internet portals, digital libraries and library catalogs. Web crawlers find new resources. Text extractors/classifiers create metadata, descriptions, rich full-text. C++.
GImageSpider is an Image Spider that has two abilities. GIS can search web by image search engines to find images. GIS can act as an image spider that crawls your arbitrary site by your constraints and find images.
Harvestman is a context aware metasearch engine which functions as a universal infromation gatherer and data mining system for the internet.
A web crawler which uses regular expressions on text downloaded from a site.
LANbyrinth is a bot that indexes a LAN and organizes its files. It is initially focused on MP3 files indexing. Features: Fully configurable Fast and smart searching Recognizes duplicated files Organizes songs by artist/album etc.
That project aims at providing a clean API, and the corresponding C++ implementation, for parsing travel-focused requests (e.g., "washington dc beijing monday r/t +aa -ua 1 week 2 adults 1 dog").
An XQuery 1.0, XSL-T 2.0 and XPath 2.0 implementation.
RSS spider for getting multiple RSS feeds into single place with search capabilities.
Relational storage for tagged documents
Restad is an indexing-querying tool for tagged documents. It uses a relational database for storage and querying. See the last news on the blog : https://sourceforge.net/p/restad/blog/ The Ruby first prototype can be found there : https://github.com/ymoreau/Restad
A fast way to rate the reading challenging level of book or text. Unlike well known reading metrics such as Fog, Kincaid, SMOG, ARI, Flesch, and Coleman-Liau readability this metric takes into account far more factors and is standarized against a corpus
The project provides an incubator for intelligent agent-assisted, AR gaming-oriented BI applications generated through the STALEMATE Knowledge-based System Design Environment (KBSDE), integrating Web-enabled knowledge bases, data mining and warehousing and directed at asset management and investment banking.
Sciense Searcher is a system that lets you search, organize and share bibliographic cites of research articles, books, booklets, collections, manuals, thesis, proceedings, technical reports, unpublished publications and misc.
SNT is a search engine for SMB and FTP shares with crawler running on Win32. Web interface is provided for searching files and browsing shares contents. Also provided shared films list with users rates and comments.
The Species Analyst (TSA) is a research project developing standards and software tools for access to the worlds natural history collection and observation databases.
Clucened is a project to build a daemon around CLucene, which is a C++ implementation of the Lucene search engine. This is *not* the CLucene project, but is a separate project to write a generic daemon based on CLucene.
iVia is an Internet subject portal or virtual library system. As a hybrid expert and machine built collection creation and management system, resources can be crawled and metadata and selected full-text can be automatically generated/extracted.
Uni-wordsplit aimed to provide a unicode(lexical analysis/word splitter) system.Especially designed for CJK(China/Japan/Korea) users. The Code based on Mozilla-XPCOM code.