Showing 736 open source projects for "data"

View related business solutions
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Spring Data Neo4j

    Spring Data Neo4j

    Provide support to increase developer productivity in Java

    ...The template programming model is equivalent to other Spring templates and builds the basis for interaction with the graph and is also used for the Spring Data repository support. Spring Data Neo4j is a core part of the Spring Data project which aims to provide convenient data access for NoSQL databases. Spring Data builds on Spring Framework, check the spring.io web-site for a wealth of reference documentation. If you are just starting out with Spring, try one of the guides.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 3
    OpenRefine

    OpenRefine

    A free, open source, powerful tool for working with messy data

    OpenRefine is a powerful Java-based tool designed to work with messy data and improve it. With OpenRefine you can load data, understand it, clean it up, transform it, reconcile it, and augment it with web services and external data. It allows you to do this all from a web browser and in the convenience and privacy of your own computer. OpenRefine keeps all data securely in your computer by running a small server on it, using your web browser to interact with it. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Jailer Database Tool

    Jailer Database Tool

    Database subsetting and relational data browsing tool

    Jailer is a tool for database subsetting, schema and data browsing. It creates small slices from your database and lets you navigate through your database following the relationships. Ideal for creating small samples of test data or for local problem analysis with relevant production data. Creates small slices from your productive database and imports the data into your development and test environment (consistent and referentially intact).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    ...HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Chat2DB

    Chat2DB

    AI-driven database tool and SQL client

    ...Just enter the names of the tables and columns, and we will automatically configure the type, password, and comment, saving you 90% of the time. Imports and exports data in multiple formats (CSV, XLSX, XLS, SQL) to facilitate exchange, backup, and migration. Transfers data between different databases or through cloud services, as a backup and recovery solution that guarantees the minimum loss of data and downtime during migrations.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 8
    Datacap

    Datacap

    DataCap is integrated software for data transformation

    Datacap is an open-source data catalog and governance tool that helps organizations manage and document their data assets. It provides metadata management, lineage tracking, and collaboration features to ensure data transparency and quality. Datacap is designed for teams that need a lightweight, self-hosted solution to organize and govern their data ecosystems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • 10
    Canal

    Canal

    MySQL binlog

    Canal is an open-source project developed by Alibaba that simulates MySQL slave functionality to parse MySQL binlog files. It enables real-time data synchronization and change data capture (CDC) between MySQL and other systems such as Elasticsearch, Kafka, or HBase. Canal is widely used for data integration, replication, and monitoring across distributed systems, offering high performance and low-latency log parsing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    eXist-db

    eXist-db

    eXist Native XML Database and Application Platform

    eXist-db is an open-source, native XML database and application platform that provides a powerful environment for storing, querying, and managing XML documents. It is designed for complex data management needs, offering XQuery, XSLT, and RESTful web services for interacting with structured data.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    XTDB

    XTDB

    General-purpose bitemporal database for SQL, Datalog & graph queries

    ...Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Infinispan

    Infinispan

    Infinispan is an open source data grid platform

    Infinispan is a distributed in-memory data grid and caching system designed for high-performance computing. It allows applications to scale dynamically by distributing data across multiple nodes, reducing latency and improving resilience.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    IoTDB

    IoTDB

    Apache IoTDB

    Apache IoTDB (Database for Internet of Things) is an IoT native database with high performance for data management and analysis, deployable on the edge and the cloud. Due to its light-weight architecture, high performance and rich feature set together with its deep integration with Apache Hadoop, Spark and Flink, Apache IoTDB can meet the requirements of massive data storage, high-speed data ingestion and complex data analysis in the IoT industrial fields. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Flink CDC

    Flink CDC

    Flink CDC is a streaming data integration tool

    Apache Flink CDC is a distributed data integration tool that captures data changes in real-time from various databases. It leverages Change Data Capture (CDC) technology to stream data changes into Apache Flink, enabling real-time analytics and data processing. Flink CDC simplifies data pipeline development with its declarative YAML configurations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ReplicaDB

    ReplicaDB

    ReplicaDB is open source tool for database replication

    ReplicaDB is an open-source, multi-platform tool for database replication, enabling the migration and synchronization of data across different relational and NoSQL databases. It is optimized for efficiency and minimal downtime.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    ObjectBox Java Database

    ObjectBox Java Database

    Java and Android Database - fast and lightweight without any ORM

    High-performance NoSQL database with integrated Data Sync for decentralized Edge Computing. Get fast access to the data you need across Embedded Devices, IoT, and Mobile. ObjectBox is 10x faster than any alternative, with high-speed data ingestion that increases response times. Check out our benchmarks. ObjectBox’ out-of-the-box synchronization makes data available when needed where needed, eliminating data islands and improving data flow. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Flyway

    Flyway

    Database migrations made easy

    ...Migrate from any version (including an empty database) to the latest version of the schema. Plain SQL scripts (including placeholder replacement). No proprietary XML formats, no lock-in. Java-based migrations for advanced data transformations and handling with LOBs. All you need is Java 7+ and your Jdbc driver and you're good to go! Full support for Amazon RDS, Microsoft SQL Azure, Google Cloud SQL, Heroku, and more. Filesystem and classpath scanning to automatically discover SQL and Java migrations. Ship migrations together with the application and run them automatically on startup using the API.
    Downloads: 66 This Week
    Last Update:
    See Project
  • 20
    ShardingSphere

    ShardingSphere

    Distributed database ecosphere

    Apache ShardingSphere is an open-source ecosystem consisted of a set of distributed database solutions, including 3 independent products, JDBC, Proxy & Sidecar (Planning). They all provide functions of data scale out, distributed transaction and distributed governance, applicable in a variety of situations such as Java isomorphism, heterogeneous language and cloud native. Apache ShardingSphere aiming at reasonably making full use of the computation and storage capacity of existed database in distributed system, rather than a totally new database. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Apache Baremaps

    Apache Baremaps

    Create custom vector tiles from OpenStreetMap

    Baremaps is an Apache Incubator project that provides tools and a Java-based pipeline for building and rendering vector tiles from OpenStreetMap (OSM) data. It’s designed for fast map generation, serving tiles, and supporting real-time updates, making it a powerful backend for map-based applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    KCloud‑Platform‑IoT

    KCloud‑Platform‑IoT

    KCloud-Platform-IoT

    KCloud-Platform-IoT is a comprehensive open-source IoT management platform built with Spring Cloud and Vue.js. It supports device registration, data collection, rule-based processing, and dashboard visualization. Designed for scalability and modularity, the platform is ideal for managing large IoT fleets in industrial or smart city environments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Apache Druid

    Apache Druid

    A high performance real-time analytics database

    Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important. As such, Druid is often used to power UIs where an interactive, consistent user experience is desired. Druid streams data from message buses such as Kafka, and Amazon Kinesis, and batch load files from data lakes such as HDFS, and Amazon S3. Druid supports most popular file formats for structured and semi-structured data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ConcourseDB

    ConcourseDB

    Distributed database warehouse for transactions, search and analytics

    ConcourseDB is a distributed, self-tuning database designed for real-time applications, offering strong consistency and ACID compliance without requiring complex configurations. It provides dynamic schema support and automatic indexing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    CrateDB

    CrateDB

    CrateDB is a distributed and scalable SQL database

    CrateDB is a distributed SQL database designed for massive machine data and real-time analytics. It combines the scalability and performance of NoSQL with the power and simplicity of SQL, allowing for horizontal scaling, full-text search, and complex queries over large datasets. Built in Java and powered by Elasticsearch and Lucene, CrateDB is optimized for high-velocity data ingestion and dynamic queries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.
Try Free →