Deploy in 115+ regions with the modern database for every enterprise.
MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.
Build generative AI apps with Vertex AI. Switch between models without switching platforms.
A general purpose source code indexer and cross-referencer that provides web-based browsing of source code with links to the definition and usage of any identifier. Supports multiple languages. Up-to-date information in http://lxr.sourceforge.net
Search engine and data mining applications and ClueWeb datasets.
The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine in C++, the Galago search engine research framework in Java, the RankLib learning to rank library, ClueWeb09 and ClueWeb12 datasets and the Sifaka data mining application.
XMLTV (http://xmltv.org/) is for grabbing TV listings primarily from websites. It has a grabber for Danish Television that grabs from http://tv.tv2.dk, but here we maintain serveral others. You can find documentation on http://niels.dybdahl.dk/xmltvdk
Framework for text mining, data integration and dataanalysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
High performance distributed in-memory key/value store
Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
Spider that recollects data from MySpace Social Network.
At now, it is only designed to extract information from native american people because it is used for a social science study in the UNAM (Universidad Nacional Autónoma de México).
Webstats Solr is an attempt to make Apache Access log easier to Data Mine. By adding a powerful Search Engine (SOLR) as a Backend and using Java Script and HTML and maybe PHP I hope to out date AWStats.
This is a Python script to parse your irssi logs and input them into a MySQL database which you can then use to search and display your logs on the web. It incrementally updates the database from the logs and is ideally run as a cronjob often.
Visualization of the contact network and user data from the popular business network XING.com. The web-based software can be used by every registered user from XING.
Build your music portal easily. With this PHPNuke module you will be able to publish music information: Artists data, albums, songs, audio samples, chart lists, themes and more. You can install it with no technical knowledge.
Irudiko is a library written in C++ for generating Locality Sensitive Hashing sketches from any textual and web document. Mainly designed to work with HTML pages, it has also an optimization support for English or Italian documents.
This project intends to create an indexing search engine, for knowledge management. The primary object is to apply an information retrieval core. And implement a knowledge data discovery theory such as data mining algorithm, text mining.
A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
phpTrafMon is a set of scripts written in php. It shows in an attractive and user-friendly way the traffic in a local network and a share in it of every user. phpTrafMon requires MySQL, crontab and a popular IPFM program.
Image2DocInfo has been made to quickly tag digital pictures. A GUI allows you to set attributes for an image, and then store them in XML files. Those files follow the Dublin Core naming scheme and are stored in the same directories than the pictures.
JavaMatch is an engine that can search inside a runtime Java data structures, and look for objects that best match the criteria that you specify. The extensive query mechanism allows for highly customizable tuning of your match queries.
The Internet Censor is a multi-platform, Internet clustering program, for which the resulting data will be used in the creation of a non-profit content-filtering Internet Search Engine for children.