Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.
Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
memfree is an open source hybrid AI search engine and page generation platform designed to help users retrieve information from both personal knowledge bases and the public web through a unified interface. The project combines retrieval-augmented search with AI summarization to deliver concise answers instead of forcing users to manually sift through multiple sources. It supports multiple AI models and search providers, enabling flexible configuration depending on cost, performance, or...
Project moved to GitHub!
https://github.com/carrot2/carrot2
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents, e.g. search results, into thematic categories. Carrot2 integrates very well with both Open Source and proprietary search engines.
TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
The Wikipedia Miner toolkit provides simplified access to Wikipedia. This open encyclopedia represents a vast, constantly evolving multilingual database of concepts and semantic relations; a promising resource for nlp and related research.
True-Hybrid Web Search Engine, which is designed to organize a web-based information by making heavy use of a mutually beneficial collaboration between Human and Artificial Intelligence.
Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder
TextMine is for the Perl hacker who is grappling with the problems of managing unstructured text from various sources. You can use these text mining tools to search the Web, index text, extract entities, categorize your e-mail, and summarize documents.
S3B - Social Semantic Search and Browsing - is a middleware that delivers a set of search and browsing components that can be used in J2EE web applications to deliver user-oriented features based on semantic descriptions and social networking
DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.
Iris is an interface for monitoring multiple pages and RSS files for changes. You can affect keywords or regexp to each web-page to receive mail alert. Version 6.6 is a web based application. Since version 7.2 it's a Perl-gtk application
Open Source Semantic Web Search Engine Software: If two machines anywhere on the web can agree on the same definition of a digital service or digital good, then machine to machine transactions can use this lingua franca to transact on the users behalf.
3store is an RDF "triple store", written in C and backed by MySQL and Berkeley DB. It is an optimisation and port of an older triple store (WebKBC). It provides access to the RDF data via RDQL or SPARQL over HTTP, on the command line or via a C API.
The Nokia Semantic Web Server is an RDF based knowledge portal
for publishing both authoritative and third party descriptions of URI
denoted resources. It also serves as Nokia's reference implementation
of URIQA, the URI Query Agent model.
SENTENSA Knowledge Miner is a platform independent tool for searching any text. SENTENSA uses robust methods of indexing and searching text, leveraging on experience from more than 20 years of information retrieval.
Catalogo is a system for cataloguing resources on a web site. It allows semantic search of information on an intranet using metadata, RDF and ontology concepts. It provides a Catalog server (Java web applications) and a Catalog client (Firefox plug-in).
The Jorne project develops software and open standards for linking Lojban text with WWW and Semantic Web metadata (e.g. RDF/N3, RSS, XML). Lojban is an artificial spoken and written language based on predicate logic.
Webcomic Archive and News Generator (WANG) is a database driven PHP application built for both aspiring and existing web comics. Written with a focus on security and speed, the code is built to be easy to use for code novices and experts alike.
SharpResource is a smart web resources retrieval engine for script based/auto modes internet data mining using c#. It is component-driven and fully customizable. It is aimed to be a versatile and robust library, not a system.
The OpenBorges project intends to provide an humble place to experiment, and debate, about what can be an open, distributed, adaptive and collaborative, semantic virtual library. Inspirations are: As we May Think, Library of Babel, and Weaving the web
Sprawler is the first Open Source internet search engine software and service - built by the community, for the community. It will address the various reasons most search engines today still are far from being where they need to be.
IDEAL means Information DEALer. A System wich provides the news and articles which the user wants. Using Tomcat, Struts, Java, MySQL an AgentSystem, Clustering, TF/IDF, Document Parser and it is multi user able.
HORUS is a system for knowledge acquisition, hypothesis generation, inference and learning. It is an interactive, internet environment accessible to a diverse community of users (public-access or membership basis) - see also UMKAILASH project for more.