Page 3 | data free download

Showing 6127 open source projects for "data"

View related business solutions

Java Clear Filters & Widen Search

AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
1

Zeebe

Distributed Workflow Engine for Microservices Orchestration

...Zeebe’s cloud-native design provides the performance, resilience, and security enterprises need to future-proof their process orchestration efforts. Zeebe distributes data across all brokers in a cluster with storage directly on the server filesystem. If one broker goes down, another can replace it with no data loss. This pre-configured replication mechanism ensures that Camunda Platform 8 can recover from machine or software failure with no human interaction, no data loss and minimal downtime.

Downloads: 17 This Week

Last Update: 8 hours ago
See Project
2

Dolphin Scheduler

A distributed and extensible workflow scheduler platform

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`. Dedicated to solving the complex task dependencies in data processing, making the scheduler system out of the box for data processing. Decentralized multi-master and multi-worker, HA is supported by itself, overload processing. All process definition operations are visualized, Visualization process defines key information at a glance, One-click deployment. ...

Downloads: 3 This Week

Last Update: 2026-03-01
See Project
3

Stirling-PDF

Web application that allows you to perform operations on PDF files

Stirling PDF is a powerful, locally hosted web-based PDF manipulation tool offering a wide range of editing, conversion, and utility features. It allows users to merge, split, compress, convert, OCR, and perform other operations on PDF files directly from a browser without uploading data to third-party servers. The tool is privacy-conscious, self-hostable via Docker, and built with modularity in mind to allow future expansion and integration.

Downloads: 37 This Week

Last Update: 2026-04-04
See Project
4

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 1 This Week

Last Update: 2025-08-05
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

JimuReport

Open source drag-and-drop reporting and dashboard builder platform

...JimuReport supports traditional report generation, print templates, and modern dashboard visualizations for business intelligence scenarios. JimuReport also includes components for building interactive charts, data tables, and analytical displays that can be used in enterprise applications. It can connect to multiple data sources and retrieve data through SQL queries, APIs, or other structured formats. It can be embedded into Java applications using Spring Boot integration modules.

Downloads: 5 This Week

Last Update: 4 days ago
See Project
6

KCloud‑Platform‑IoT

KCloud-Platform-IoT

KCloud-Platform-IoT is a comprehensive open-source IoT management platform built with Spring Cloud and Vue.js. It supports device registration, data collection, rule-based processing, and dashboard visualization. Designed for scalability and modularity, the platform is ideal for managing large IoT fleets in industrial or smart city environments.

Downloads: 8 This Week

Last Update: 5 days ago
See Project
7

Flink CDC

Flink CDC is a streaming data integration tool

Apache Flink CDC is a distributed data integration tool that captures data changes in real-time from various databases. It leverages Change Data Capture (CDC) technology to stream data changes into Apache Flink, enabling real-time analytics and data processing. Flink CDC simplifies data pipeline development with its declarative YAML configurations.

Downloads: 1 This Week

Last Update: 2026-03-29
See Project
8

Reactor Core

Non-Blocking Reactive Foundation for the JVM

Reactor Core is a foundational library for building reactive applications in Java, providing a powerful API for asynchronous, non-blocking programming.

Downloads: 7 This Week

Last Update: 4 days ago
See Project
9

Frouros

Frouros is an open-source Python library for drift detection

Frouros is a Python library for drift detection in machine learning systems that provides a combination of classical and more recent algorithms for both concept and data drift detection.

Downloads: 6 This Week

Last Update: 2024-09-29
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

Apache Avro

Apache Avro is a data serialization system

Apache Avro™ is a data serialization system. Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation is an optional optimization, is only worth implementing for statically typed languages. Avro relies on schemas. When Avro data is read, the schema used when writing it is always present.

Downloads: 1 This Week

Last Update: 2024-08-05
See Project
11

eXist-db

eXist Native XML Database and Application Platform

eXist-db is an open-source, native XML database and application platform that provides a powerful environment for storing, querying, and managing XML documents. It is designed for complex data management needs, offering XQuery, XSLT, and RESTful web services for interacting with structured data.

Downloads: 10 This Week

Last Update: 2026-03-05
See Project
12

Hazelcast

Open-source distributed computation and storage platform

Hazelcast is a streaming and memory-first application platform for fast, stateful, data-intensive workloads on-premises, at the edge or as a fully managed cloud service. Hazelcast is a distributed computation and storage platform for consistently low-latency querying, aggregation and stateful computation against event streams and traditional data sources. It allows you to quickly build resource-efficient, real-time applications.

Downloads: 8 This Week

Last Update: 2025-10-15
See Project
13

JavaParser

Java 1-17 Parser and Abstract Syntax Tree for Java

This project contains a set of libraries implementing a Java 1.0 - Java 17 Parser with advanced analysis functionalities. The project binaries are available in Maven Central. We strongly advise users to adopt Maven, Gradle or another build system for their projects. If you are not familiar with them we suggest taking a look at the maven quickstart projects. Since Version 3.5.10, the JavaParser project includes the JavaSymbolSolver. While JavaParser generates an Abstract Syntax Tree,...

Downloads: 8 This Week

Last Update: 2026-01-10
See Project
14

Qualitis

Qualitis is a one-stop data quality management platform

Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. ...

Downloads: 0 This Week

Last Update: 2025-10-17
See Project
15

Planetiler

Flexible tool to build planet-scale vector tilesets

...Planetiler packages tiles into an MBTiles (SQLite) or PMTiles file that can be served using tools like TileServer GL or Martin or even queried directly from the browser. See awesome-vector-tiles for more projects that work with data in this format. Planetiler works by mapping input elements to vector tile features, flattening them into a big list, and then sorting by tile ID to group them into tiles.

Downloads: 11 This Week

Last Update: 2026-03-28
See Project
16

XTDB

General-purpose bitemporal database for SQL, Datalog & graph queries

...Both structured and unstructured data are at home in XTDB. Legal regulations like GDPR often pose a challenge when designing systems around immutable data.

Downloads: 3 This Week

Last Update: 2025-12-01
See Project
17

Apache Drill

Apache Drill is a distributed MPP query layer for self describing data

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel. Get faster insights without the overhead (data loading, schema creation and maintenance, transformations, etc.) Analyze the multi-structured and nested data in non-relational datastores directly without transforming or restricting the data. Leverage your existing SQL skillsets and BI tools including Tableau, Qlikview, MicroStrategy, Spotfire, Excel and more. ...

Downloads: 2 This Week

Last Update: 2025-06-17
See Project
18

GeoServer

GeoServer repository

GeoServer is an open-source software server written in Java that allows users to share and edit geospatial data. Designed for interoperability, it publishes data from any major spatial data source using open standards. Being a community-driven project, GeoServer is developed, tested, and supported by a diverse group of individuals and organizations from around the world. GeoServer is the reference implementation of the Open Geospatial Consortium (OGC) Web Feature Service (WFS) and Web Coverage Service (WCS) standards, as well as a high-performance certified compliant Web Map Service (WMS), compliant Catalog Service for the Web (CSW) and implementing Web Processing Service (WPS). ...

Downloads: 17 This Week

Last Update: 2026-02-18
See Project
19

Google Cloud Dataflow Template Pipelines

Cloud Dataflow Google-provided templates for solving data tasks

DataflowTemplates is the source repository for Google-provided Dataflow templates that are intended to solve large-scale in-cloud data processing tasks without requiring users to build everything from scratch in a full development environment. The repository is centered on templated pipelines powered by Google Cloud Dataflow and Apache Beam, making it easier to run common integration and movement jobs such as data import, export, backup, restore, and bulk API operations. ...

Downloads: 6 This Week

Last Update: 4 days ago
See Project
20

QuestDB

An open source SQL database designed to process time series data

...It includes endpoints for PostgreSQL wire protocol, high-throughput schema-agnostic ingestion using InfluxDB Line Protocol, and a REST API for queries, bulk imports, and exports. QuestDB implements ANSI SQL with native extensions for time-oriented language features. These extensions make it simple to correlate data from multiple sources using relational and time series joins. QuestDB achieves high performance from a column-oriented storage model, massively-parallelized vector execution, SIMD instructions, and various low-latency techniques. The entire codebase was built from the ground up in Java and C++, with no dependencies, and is 100% free from garbage collection. ...

Downloads: 19 This Week

Last Update: 4 days ago
See Project
21

ODD Platform

First open-source data discovery and observability platform

Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge.

Downloads: 0 This Week

Last Update: 2026-04-03
See Project
22

Logstash Logback Encoder

Logback JSON encoder and appenders

...Originally written to support output in Logstash's JSON format, but has evolved into a highly configurable, general-purpose, structured logging mechanism for JSON and other Jackson data forms. The structure of the output, and the data it contains, is fully configurable. The general composite JSON encoders/layouts can be used to output any JSON format/data by configuring them with various JSON providers. The Logstash encoders/layouts are really just extensions of the general composite JSON encoders/layouts with a pre-defined set of providers. ...

Downloads: 3 This Week

Last Update: 2025-10-26
See Project
23

Random Cut Forest by AWS

An implementation of the Random Cut Forest data structure

This repository contains implementations of the Random Cut Forest (RCF) probabilistic data structure. RCFs were originally developed at Amazon to use in a nonparametric anomaly detection algorithm for streaming data. Later new algorithms based on RCFs were developed for density estimation, imputation, and forecasting. The different directories correspond to equivalent implementations in different languages, and bindings to to those base implementations, using language-specific features for greater flexibility of use.

Downloads: 5 This Week

Last Update: 2025-09-10
See Project
24

CrateDB

CrateDB is a distributed and scalable SQL database

CrateDB is a distributed SQL database designed for massive machine data and real-time analytics. It combines the scalability and performance of NoSQL with the power and simplicity of SQL, allowing for horizontal scaling, full-text search, and complex queries over large datasets. Built in Java and powered by Elasticsearch and Lucene, CrateDB is optimized for high-velocity data ingestion and dynamic queries.

Downloads: 3 This Week

Last Update: 2026-03-30
See Project
25

Synthea Patient Generator

Synthetic Patient Population Simulator

SyntheaTM is an open-source, synthetic patient generator that models the medical history of synthetic patients. Our mission is to provide high-quality, synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable. The models used to generate synthetic patients are informed by numerous academic publications. Our synthetic populations provide insight into the validity of this research and encourage future studies in population health. ...

Downloads: 2 This Week

Last Update: 2026-03-05
See Project