Showing 253 open source projects for "big data"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Apache Polaris

    Apache Polaris

    Apache Polaris, the interoperable, open source catalog

    Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Nebula Graph

    Nebula Graph

    A distributed, fast open-source graph database

    The graph database built for super large-scale graphs with milliseconds of latency. Optimized SUBGRAPH and FIND PATH for better performance. Optimized query paths to reduce redundant paths and time complexity. Optimized the method to get properties for better performance of MATCH statements. Nebula Graph adopts the Apache 2.0 license, one of the most permissive free software licenses in the world. Free as in freedom, because, under the Apache 2.0 license, you can use, copy, modify and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    awesome-single-cell

    awesome-single-cell

    Community-curated list of software packages and data resources

    ...The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. An analytical framework for big-scale single cell data. Transform percentage-based units into a 2d space to evaluate changes in distribution with both magnitude and direction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 5
    Fishing Funds

    Fishing Funds

    Fund, big market, stock, virtual currency status bar display for apps

    Display real-time trends of Chinese funds in the menubar. Fund, big market, stock, virtual currency status bar displays small applications, developed based on Electron, supports MacOS, Windows, Linux clients, data sources come from Tiantian Fund, Ant Fund, Love Fund, Tencent Securities, Sina Fund, etc. This project refers to electron-react-boilerplate-menubar, which is developed based on Electron React Boilerplate and menubar.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    BFG Repo-Cleaner

    BFG Repo-Cleaner

    Remove large or troublesome blobs

    The BFG is a simpler, faster alternative to git-filter-branch for cleansing bad data out of your Git repository history. You can use it for removing crazy big files, and for removing passwords, credentials and other private data. The git-filter-branch command is enormously powerful and can do things that the BFG can't, but the BFG is much better for the tasks above, because is faster and simpler. The BFG isn't particularily clever, but is focused on making the above tasks easy. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    Apache Iceberg

    Apache Iceberg

    Apache Iceberg

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time. The core Java library that tracks table snapshots and metadata is complete, but still evolving. Current work is focused on adding row-level deletes and upserts, and integration work with new engines like Flink and Hive. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Cloudberry

    Cloudberry

    One advanced and mature open-source MPP

    Apache Cloudberry is a distributed real-time analytics engine designed for querying massive social media datasets. It integrates with Apache AsterixDB and supports efficient ad-hoc queries and aggregations across large volumes of data. Cloudberry is especially useful for dashboards, trend analysis, and time-series social data exploration.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Magda

    Magda

    A federated data catalog for all your big and small data

    Magda is an open-source data catalog system designed to make datasets easier to find, access, and use. Built for government and enterprise use, it supports harvesting metadata from multiple sources, managing data access policies, and integrating with data APIs. Magda is highly customizable and ideal for building open data portals or internal data discovery tools.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when the context is too big. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Dask

    Dask

    Parallel computing with task scheduling

    ...It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Grafana Alloy

    Grafana Alloy

    OpenTelemetry Collector distribution with programmable pipelines

    Grafana Alloy is an open source OpenTelemetry Collector distribution with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles. Grafana Alloy is Grafana Labs’ distribution of the OpenTelemetry Collector. It is an OTLP-compatible collector with built-in Prometheus optimizations that also support signals across metrics, logs, traces, and profiles. Alloy was started at Grafana Labs and announced at GrafanaCON in 2024. The mission of the project is to create the best...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 13
    testng

    testng

    TestNG testing framework

    TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Modin

    Modin

    Scale your Pandas workflows by changing a single line of code

    Scale your pandas workflow by changing a single line of code. Modin uses Ray, Dask or Unidist to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. It is not necessary to know in advance the available hardware resources in order to use Modin. Additionally, it is not necessary to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LakeSoul

    LakeSoul

    An end-to-end, realtime and cloud native Lakehouse framework

    LakeSoul is a high-performance, unified table storage framework for big data lakes, supporting both streaming and batch data in a single format. Built on top of Apache Spark and leveraging Apache Arrow and Parquet, LakeSoul provides ACID transactions, schema evolution, and time travel. It is designed for large-scale data lake architectures that require consistency, efficiency, and easy integration with modern data stacks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Redash

    Redash

    Connect to any data source, easily visualize and share your data

    ...It lets you create big, beautiful and easy to digest visualizations on dashboards for better decision-making. Redash supports a multitude of SQL and NoSQL data sources, and can be extended to support even more. Best of all it’s open source, so you can customize and add features to suit your organization’s needs perfectly.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 17
    TimescaleDB

    TimescaleDB

    An open-source time-series SQL database optimized for fast ingest

    TimescaleDB is the open-source relational database for time-series and analytics. Build powerful data-intensive applications. Become instantly productive with full SQL. Rely on the same PostgreSQL you know, love, and trust. Hyperfunctions make time series easier. Achieve 10-100x faster queries than with vanilla PostgreSQL, InfluxDB, MongoDB. Write millions of data points per second per node. Horizontally scale to petabytes. Don’t worry about cardinality. Simplify your stack, ask more complex...
    Downloads: 59 This Week
    Last Update:
    See Project
  • 18
    Vue Json Pretty

    Vue Json Pretty

    A JSON tree view component that is easy to use

    A Vue component for rendering JSON data as a tree structure. The CSS file is included separately and needs to be imported manually. You can either import CSS globally in your app (if supported by your framework) or directly from the component.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ROOT

    ROOT

    Analyzing, storing and visualizing big data, scientifically

    ROOT is a unified software package for the storage, processing, and analysis of scientific data: from its acquisition to the final visualization in the form of highly customizable, publication-ready plots. It is reliable, performant and well supported, easy to use and obtain, and strives to maximize the quantity and impact of scientific results obtained per unit cost, both of human effort and computing resources. ROOT provides a very efficient storage system for data models, that...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    Volcano

    Volcano

    A Cloud Native Batch System (Project under CNCF)

    ...It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high-performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open-source community. ...
    Downloads: 280 This Week
    Last Update:
    See Project
  • 21
    Functors.jl

    Functors.jl

    Parameterise all the things

    Functors.jl provides tools to express a powerful design pattern for dealing with large/ nested structures, as in machine learning and optimization. For large machine learning models, it can be cumbersome or inefficient to work with parameters as one big, flat vector, and structs help manage complexity; but it is also desirable to easily operate over all parameters at once, e.g. for changing precision or applying an optimizer update step.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. It’s especially useful for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Kinto

    Kinto

    A generic JSON document store with sharing and synchronisation options

    ...Kinto is used at Mozilla and released under the Apache v2 license. It’s hard for frontend developers to respect users' privacy when building applications that work offline, store data remotely and synchronize across devices. Existing solutions either rely on big corporations that crave user data or require a non-trivial amount of time and expertise to set up a new server for every new project. We want to help developers focus on the front, and we don’t want the challenge of storing user data to get in their way. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    PHP7

    PHP7

    PHP7 / Laravel Multi-format Streaming Parser

    When it comes to parsing XML/CSV/JSON/... documents, there are 2 approaches to consider. DOM loading loads all the documents, making it easy to navigate and parse, and as such provides maximum flexibility for developers. Streaming implies iterating through the document, acts like a cursor, and stops at each element in its way, thus avoiding memory overkill. Thus, when it comes to big files, callbacks will be executed meanwhile file is downloading and will be much more efficient as far as...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    Planetiler

    Planetiler

    Flexible tool to build planet-scale vector tilesets

    ...Planetiler packages tiles into an MBTiles (SQLite) or PMTiles file that can be served using tools like TileServer GL or Martin or even queried directly from the browser. See awesome-vector-tiles for more projects that work with data in this format. Planetiler works by mapping input elements to vector tile features, flattening them into a big list, and then sorting by tile ID to group them into tiles.
    Downloads: 16 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB