Showing 288 open source projects for "data processing"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    Riemann

    Riemann

    A network event stream processing system, in Clojure

    Riemann aggregates events from your servers and applications with a powerful stream processing language. Send an email for every exception in your app. Track the latency distribution of your web app. See the top processes on any host, by memory and CPU. Combine statistics from every Riak node in your cluster and forward to Graphite. Track user activity from second to second. Riemann streams are just functions which accept an event. Events are just structs with some common fields like :host...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    GemGIS

    GemGIS

    Spatial data processing for geomodeling

    GemGIS is a Python-based, open-source geographic information processing library. It is capable of preprocessing spatial data such as vector data (shape files, geojson files, geopackages,…), raster data (tif, png,…), data obtained from online services (WCS, WMS, WFS) or XML/KML files (soon). Preprocessed data can be stored in a dedicated Data Class to be passed to the geomodeling package GemPy in order to accelerate the model-building process. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    HStreamDB

    HStreamDB

    HStreamDB is an open-source, cloud-native streaming database

    HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications. By subscribing to streams in HStreamDB, any update of the data stream will be pushed to your apps in real-time, and this promotes your apps to be more responsive. You can also replace message brokers with HStreamDB and everything you do with message brokers can be done better with HStreamDB. HStreamDB provides built-in support for event time-based stream processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Acl

    Acl

    A powerful server and network library, including coroutine

    The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...
    Downloads: 14 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Benthos

    Benthos

    Fancy stream processing made operationally mundane

    Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    DSP.jl

    DSP.jl

    Filter design, periodograms, window functions

    DSP.jl provides a number of common digital signal processing routines in Julia.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    CGAL

    CGAL

    The Computational Geometry Algorithms Library

    ...These algorithms are useful in a wide range of applications, including computer aided design, robotics, molecular biology, medical imaging, geographic information systems and more. CGAL features a great range of data structures and algorithms, including Voronoi diagrams, cell complexes and polyhedra, triangulations, arrangements of curves, surface and volume mesh generation, spatial searching, alpha shapes, geometry processing, and many more. The use of these result in beautiful, visually complex and accurate representations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    collapse

    collapse

    Advanced and Fast Data Transformation in R

    collapse is a high-performance R package designed for fast and efficient data transformation, aggregation, reshaping, and statistical computation. Built to offer a more performant alternative to dplyr and data.table, it is particularly well-suited for large datasets and econometric applications. It operates on base R data structures like data frames and vectors and uses highly optimized C++ code under the hood to deliver significant speed improvements. collapse also includes tools for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Kestra

    Kestra

    Kestra is an infinitely scalable orchestration and scheduling platform

    Build reliable workflows, blazingly fast, deploy in just a few clicks. Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ClimateTools.jl

    ClimateTools.jl

    Climate science package for Julia

    Climate analysis tools in Julia. ClimateTools.jl is a collection of commonly-used tools in Climate science. Basics of climate field analysis are covered, with some forays into exploratory techniques associated with climate scenario design. The package is aimed to ease the typical steps of analysis of climate models outputs and gridded datasets (support for weather stations is a work-in-progress). Climate indices and bias correction functions are coded to leverage the use of multiple threads....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    GenStage

    GenStage

    Producer and consumer actors with back-pressure for Elixir

    ...Its clear separation of concerns encourages testable, composable stages that can be rearranged as requirements evolve. In production, this leads to predictable, resilient dataflows for event ingestion, batching, and parallel processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Fincept Terminal

    Fincept Terminal

    FinceptTerminal is a modern finance application

    Fincept Terminal is an open-source financial intelligence platform aimed at bringing powerful market analysis and investment research tools to a broad audience without the prohibitive cost of proprietary terminals. The project provides both command-line and graphical interfaces that let users access real-time market data, economic indicators, and advanced analytics directly from a unified terminal environment, supporting stocks, forex, commodities, and more. Its architecture integrates Python, TypeScript, Rust, and React, reflecting both a robust data processing backend and a modern desktop UI experience. FinceptTerminal emphasizes AI-powered insights and automation, offering technical and fundamental analysis, sentiment data, and customizable workflows that help traders and analysts make informed decisions efficiently. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 19
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Watermill

    Watermill

    Building event-driven applications the easy way in Go

    Go library for building event-driven applications. Our goal was to create a tool that is easy to understand, even by junior developers. It doesn't matter if you want to do Event-driven architecture, CQRS, Event Sourcing or just stream MySQL Binlog to Kafka. Watermill was designed to process hundreds of thousands of messages per second. Every component is built in a way that allows you to configure it for your needs. You can also implement your own middleware for the router. Watermill is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Scio

    Scio

    A Scala API for Apache Beam and Google Cloud Dataflow

    Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Neuroglancer

    Neuroglancer

    WebGL-based viewer for volumetric data

    ...The viewer is built with a multi-threaded architecture, separating rendering and data processing to ensure smooth performance even with massive datasets. Extensively used in neuroscience research, Neuroglancer supports integration with tools.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    Qualitis

    Qualitis

    Qualitis is a one-stop data quality management platform

    Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. ...
    Downloads: 0 This Week
    Last Update:
    See Project