Page 2 | big data free download

Showing 253 open source projects for "big data"

View related business solutions

Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Earn up to 16% annual interest with Nexo.
More flexibility. More control.

Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
1

Apache Polaris

Apache Polaris, the interoperable, open source catalog

Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query...

Downloads: 3 This Week

Last Update: 2026-03-13
See Project
2

marimo

A reactive notebook for Python

marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots,...

Downloads: 2 This Week

Last Update: 2026-04-10
See Project
3

Nebula Graph

A distributed, fast open-source graph database

The graph database built for super large-scale graphs with milliseconds of latency. Optimized SUBGRAPH and FIND PATH for better performance. Optimized query paths to reduce redundant paths and time complexity. Optimized the method to get properties for better performance of MATCH statements. Nebula Graph adopts the Apache 2.0 license, one of the most permissive free software licenses in the world. Free as in freedom, because, under the Apache 2.0 license, you can use, copy, modify and...

Downloads: 0 This Week

Last Update: 2024-05-17
See Project
4

awesome-single-cell

Community-curated list of software packages and data resources

...The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. An analytical framework for big-scale single cell data. Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
5

Fishing Funds

Fund, big market, stock, virtual currency status bar display for apps

Display real-time trends of Chinese funds in the menubar. Fund, big market, stock, virtual currency status bar displays small applications, developed based on Electron, supports MacOS, Windows, Linux clients, data sources come from Tiantian Fund, Ant Fund, Love Fund, Tencent Securities, Sina Fund, etc. This project refers to electron-react-boilerplate-menubar, which is developed based on Electron React Boilerplate and menubar.

Downloads: 8 This Week

Last Update: 2026-01-16
See Project
6

BFG Repo-Cleaner

Remove large or troublesome blobs

The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history. You can use it for removing crazy big files, and for removing passwords, credentials and other private data. The git-filter-branch command is enormously powerful and can do things that the BFG can't, but the BFG is much better for the tasks above, because is faster and simpler. The BFG isn't particularily clever, but is focused on making the above tasks easy. ...

Downloads: 9 This Week

Last Update: 2025-01-18
See Project
7

Apache Iceberg

Apache Iceberg

Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time. The core Java library that tracks table snapshots and metadata is complete, but still evolving. Current work is focused on adding row-level deletes and upserts, and integration work with new engines like Flink and Hive. ...

Downloads: 2 This Week

Last Update: 2025-12-22
See Project
8

Cloudberry

One advanced and mature open-source MPP

Apache Cloudberry is a distributed real-time analytics engine designed for querying massive social media datasets. It integrates with Apache AsterixDB and supports efficient ad-hoc queries and aggregations across large volumes of data. Cloudberry is especially useful for dashboards, trend analysis, and time-series social data exploration.

Downloads: 5 This Week

Last Update: 2026-04-13
See Project
9

Magda

A federated data catalog for all your big and small data

Magda is an open-source data catalog system designed to make datasets easier to find, access, and use. Built for government and enterprise use, it supports harvesting metadata from multiple sources, managing data access policies, and integrating with data APIs. Magda is highly customizable and ideal for building open data portals or internal data discovery tools.

Downloads: 2 This Week

Last Update: 2025-12-24
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

LlamaIndex

Central interface to connect your LLM's with external data

LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when the context is too big. ...

Downloads: 0 This Week

Last Update: 2026-04-03
See Project
11

Dask

Parallel computing with task scheduling

...It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
12

Grafana Alloy

OpenTelemetry Collector distribution with programmable pipelines

Grafana Alloy is an open source OpenTelemetry Collector distribution with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles. Grafana Alloy is Grafana Labs’ distribution of the OpenTelemetry Collector. It is an OTLP-compatible collector with built-in Prometheus optimizations that also support signals across metrics, logs, traces, and profiles. Alloy was started at Grafana Labs and announced at GrafanaCON in 2024. The mission of the project is to create the best...

Downloads: 19 This Week

Last Update: 7 days ago
See Project
13

testng

TestNG testing framework

TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).

Downloads: 4 This Week

Last Update: 2026-01-22
See Project
14

Modin

Scale your Pandas workflows by changing a single line of code

Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to...

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
15

LakeSoul

An end-to-end, realtime and cloud native Lakehouse framework

LakeSoul is a high-performance, unified table storage framework for big data lakes, supporting both streaming and batch data in a single format. Built on top of Apache Spark and leveraging Apache Arrow and Parquet, LakeSoul provides ACID transactions, schema evolution, and time travel. It is designed for large-scale data lake architectures that require consistency, efficiency, and easy integration with modern data stacks.

Downloads: 1 This Week

Last Update: 2025-09-26
See Project
16

Redash

Connect to any data source, easily visualize and share your data

...It lets you create big, beautiful and easy to digest visualizations on dashboards for better decision-making. Redash supports a multitude of SQL and NoSQL data sources, and can be extended to support even more. Best of all it’s open source, so you can customize and add features to suit your organization’s needs perfectly.

Downloads: 10 This Week

Last Update: 2026-03-02
See Project
17

TimescaleDB

An open-source time-series SQL database optimized for fast ingest

TimescaleDB is the open-source relational database for time-series and analytics. Build powerful data-intensive applications. Become instantly productive with full SQL. Rely on the same PostgreSQL you know, love, and trust. Hyperfunctions make time series easier. Achieve 10-100x faster queries than with vanilla PostgreSQL, InfluxDB, MongoDB. Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality. Simplify your stack, ask more complex...

Downloads: 59 This Week

Last Update: 6 days ago
See Project
18

Vue Json Pretty

A JSON tree view component that is easy to use

A Vue component for rendering JSON data as a tree structure. The CSS file is included separately and needs to be imported manually. You can either import CSS globally in your app (if supported by your framework) or directly from the component.

Downloads: 0 This Week

Last Update: 2025-10-28
See Project
19

ROOT

Analyzing, storing and visualizing big data, scientifically

ROOT is a unified software package for the storage, processing, and analysis of scientific data: from its acquisition to the final visualization in the form of highly customizable, publication-ready plots. It is reliable, performant and well supported, easy to use and obtain, and strives to maximize the quantity and impact of scientific results obtained per unit cost, both of human effort and computing resources. ROOT provides a very efficient storage system for data models, that...

Downloads: 6 This Week

Last Update: 2026-03-14
See Project
20

Volcano

A Cloud Native Batch System (Project under CNCF)

...It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high-performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open-source community. ...

Downloads: 280 This Week

Last Update: 2026-03-30
See Project
21

Functors.jl

Parameterise all the things

Functors.jl provides tools to express a powerful design pattern for dealing with large/ nested structures, as in machine learning and optimization. For large machine learning models, it can be cumbersome or inefficient to work with parameters as one big, flat vector, and structs help manage complexity; but it is also desirable to easily operate over all parameters at once, e.g. for changing precision or applying an optimizer update step.

Downloads: 1 This Week

Last Update: 2024-11-28
See Project
22

Bacalhau

Community-driven, simple, yet powerful framework

Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. It’s especially useful for...

Downloads: 2 This Week

Last Update: 2025-06-23
See Project
23

Kinto

A generic JSON document store with sharing and synchronisation options

...Kinto is used at Mozilla and released under the Apache v2 license. It’s hard for frontend developers to respect users' privacy when building applications that work offline, store data remotely and synchronize across devices. Existing solutions either rely on big corporations that crave user data or require a non-trivial amount of time and expertise to set up a new server for every new project. We want to help developers focus on the front, and we don’t want the challenge of storing user data to get in their way. ...

Downloads: 4 This Week

Last Update: 6 days ago
See Project
24

PHP7

PHP7 / Laravel Multi-format Streaming Parser

When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider. DOM loading loads all the documents, making it easy to navigate and parse, and as such provides maximum flexibility for developers. Streaming implies iterating through the document, acts like a cursor, and stops at each element in its way, thus avoiding memory overkill. Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as...

Downloads: 7 This Week

Last Update: 2025-05-13
See Project
25

Planetiler

Flexible tool to build planet-scale vector tilesets

...Planetiler packages tiles into an MBTiles (SQLite) or PMTiles file that can be served using tools like TileServer GL or Martin or even queried directly from the browser. See awesome-vector-tiles for more projects that work with data in this format. Planetiler works by mapping input elements to vector tile features, flattening them into a big list, and then sorting by tile ID to group them into tiles.

Downloads: 16 This Week

Last Update: 2026-03-28
See Project