Showing 351 open source projects for "data processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Apache Beam

    Apache Beam

    Unified programming model for Batch and Streaming

    Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Broadway

    Broadway

    Concurrent and multi-stage data ingestion and data processing

    Broadway is a data processing library for Elixir designed to handle high-throughput, concurrent workloads with ease. It provides an abstraction for defining pipelines that consume data from sources like RabbitMQ, Kafka, Amazon SQS, or custom producers. Each pipeline is fault-tolerant and backpressure-aware, ensuring stable throughput even under load.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DALI

    DALI

    A GPU-accelerated library containing highly optimized building blocks

    The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built-in data loaders and data iterators in popular deep learning frameworks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Easy3D

    Easy3D

    Efficient library for processing 3D data

    Easy3D is a lightweight, easy-to-use, and efficient library for processing and rendering 3D data, implemented in C++ with Python bindings. It is designed for tasks such as 3D modeling, geometry processing, and rendering, emphasizing simplicity and efficiency. Easy3D serves as a valuable tool for research, education, and the development of sophisticated 3D applications, providing a solid foundation for handling 3D data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Benthos

    Benthos

    Fancy stream processing made operationally mundane

    Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...
    Downloads: 58 This Week
    Last Update:
    See Project
  • 15
    Kingfisher

    Kingfisher

    Lightweight, pure-Swift library for downloading images from the web

    Kingfisher is a powerful, pure-Swift library for downloading and caching images from the web. It provides you a chance to use a pure-Swift way to work with remote images in your next app. Asynchronous image downloading and caching. Loading image from either URLSession-based networking or local provided data. Useful image processors and filters provided. Multiple-layer hybrid cache for both memory and disk. Fine control on cache behavior. Customizable expiration date and size limit....
    Downloads: 11 This Week
    Last Update:
    See Project
  • 16
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 17
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with...
    Downloads: 92 This Week
    Last Update:
    See Project
  • 18
    CGAL

    CGAL

    The Computational Geometry Algorithms Library

    ...These algorithms are useful in a wide range of applications, including computer aided design, robotics, molecular biology, medical imaging, geographic information systems and more. CGAL features a great range of data structures and algorithms, including Voronoi diagrams, cell complexes and polyhedra, triangulations, arrangements of curves, surface and volume mesh generation, spatial searching, alpha shapes, geometry processing, and many more. The use of these result in beautiful, visually complex and accurate representations.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    Point Cloud Library

    Point Cloud Library

    A standalone, large scale, open project for 2D/3D image processing

    The Point Cloud Library (PCL) is a standalone, large scale, open project for 2D/3D image and point cloud processing. PCL is released under the terms of the BSD license, and thus free for commercial and research use. Whether you’ve just discovered PCL or you’re a long time veteran, this page contains links to a set of resources that will help consolidate your knowledge on PCL and 3D processing. An additional Wiki resource for developers is available too. To simplify both usage and...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 20
    Akka

    Akka

    Build concurrent, distributed, and resilient message-driven apps

    ...Small memory footprint; ~2.5 million actors per GB of heap. Distributed systems without single points of failure. Load balancing and adaptive routing across nodes. Event Sourcing and CQRS with Cluster Sharding. Distributed Data for eventual consistency using CRDTs. Asynchronous non-blocking stream processing with backpressure.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    ArrayFire

    ArrayFire

    ArrayFire, a general purpose GPU library

    ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market. Data structures in ArrayFire are smartly managed to avoid costly memory transfers and to take advantage of each performance feature provided by the underlying hardware. The community of ArrayFire developers invites you to build with us if...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Open3D

    Open3D

    A modern library for 3D data processing

    Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. Open3D was developed from a clean slate with a small and carefully considered set of dependencies. It can be set up on different platforms and compiled from source with minimal effort. The code is clean, consistently...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    OmniTools

    OmniTools

    Self-hosted collection of powerful web-based tools for everyday tasks

    ...It’s designed to replace the random assortment of “free online tools” people use for quick tasks, while avoiding ads, tracking, and the need to upload sensitive files to unknown servers. A key design choice is that file processing happens entirely on the client side, meaning your data stays in your browser instead of being sent to the backend. The tool catalog spans both technical and non-technical needs, including image, video, audio, PDF, text, date/time, math, and data format utilities like JSON/CSV/XML helpers. It’s also packaged for straightforward self-hosting, with a lightweight Docker image and simple run commands, so it can be deployed quickly on a homelab or internal network.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 24
    Jimp

    Jimp

    An image processing library written entirely in JavaScript for Node

    An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    Acl

    Acl

    A powerful server and network library, including coroutine

    The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...
    Downloads: 11 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB