data processing free download

Showing 1439 open source projects for "data processing"

View related business solutions

Mac Clear Filters & Widen Search

Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

Data-Juicer

Data processing for and with foundation models

Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.

Downloads: 0 This Week

Last Update: 2026-05-29
See Project
2

Data Formulator

Create rich visualizations with AI

To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...

Downloads: 1 This Week

Last Update: 2026-05-28
See Project
3

Polymarket Data

Polymarket Data Retriever that fetches, processes, and structures data

Polymarket Data is a comprehensive data engineering pipeline designed to collect, process, and structure trading activity from the Polymarket prediction market ecosystem into analyzable datasets. The system operates as a multi-stage pipeline that integrates data from both off-chain APIs and on-chain event sources, enabling users to reconstruct full trading activity including markets, order events, and executed trades. It begins by fetching market metadata such as questions, outcomes, and...

Downloads: 0 This Week

Last Update: 2026-04-27
See Project
4

Synthetic Data Generator

SDG is a specialized framework

...It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

NYC Taxi Data

Import public NYC taxi and for-hire vehicle (Uber, Lyft)

The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...

Downloads: 1 This Week

Last Update: 2025-10-01
See Project
6

Agentic Data Scientist

An end-to-end Data Scientist

...Each agent is designed to independently call functions, interact with data sources, and adapt to uncertainties during processing, enabling iterative refinement of models without manual coordination. The framework supports interoperability with existing data tools and libraries, letting the agents leverage libraries like pandas, scikit-learn, and visualization frameworks to perform real computations rather than mock demonstrations.

Downloads: 0 This Week

Last Update: 2026-05-29
See Project
7

MeshLab

The open source mesh processing system

...VCG can be used as a stand-alone large-scale automated mesh processing pipeline, while MeshLab makes it easy to experiment with its algorithms interactively. The open source system for processing and editing 3D triangular meshes. It provides a set of tools for editing, cleaning, healing, inspecting, rendering, texturing and converting meshes. It offers features for processing raw data produced by 3D digitization tools/devices and for preparing models for 3D printing.

Downloads: 104 This Week

Last Update: 2025-07-22
See Project
8

CyberChef

A web app for encryption, encoding, compression and data analysis

CyberChef, developed by GCHQ, is a versatile web application dubbed the "Cyber Swiss Army Knife." It enables users to perform a wide array of operations on data, including encryption, encoding, compression, and analysis, all within a browser interface.

Downloads: 63 This Week

Last Update: 6 days ago
See Project
9

Numaflow

Kubernetes-native platform to run massively parallel data/streaming

Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
Stop vibe-debugging.
Plug Claude into your app's actual errors.

AppSignal's MCP server hands Claude, Cursor, or Zed your real errors, traces, and the deploy that shipped them. AI writes the fix; you review the diff.

Free 30 days.
10

pdfcpu

A PDF processor written in Go

pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...

Downloads: 14 This Week

Last Update: 2026-06-09
See Project
11

Kapacitor

Open source framework for processing, monitoring, and alerting

Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.

Downloads: 0 This Week

Last Update: 2026-05-26
See Project
12

go-streams

A lightweight stream processing library for Go

A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.

Downloads: 0 This Week

Last Update: 2025-05-10
See Project
13

LOTUS

AI-Powered Data Processing: Use LOTUS to process all of your datasets

LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. ...

Downloads: 9 This Week

Last Update: 2026-06-13
See Project
14

Arroyo

Distributed stream processing engine in Rust

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
15

LAStools

efficient tools for LiDAR processing

LAStools is a collection of efficient, multi-core, scriptable tools for processing LiDAR data. It supports various formats, including LAS, LAZ, Terrasolid BIN, and ESRI Shapefiles, providing a comprehensive suite for LiDAR data management and analysis.

Downloads: 7 This Week

Last Update: 2026-05-05
See Project
16

Bytewax

Python Stream Processing

...Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.

Downloads: 0 This Week

Last Update: 2024-11-25
See Project
17

ThingsBoard

Device management, data collection, processing and visualization

...Define relations between your devices, assets, customers or any other entities. Collect and store telemetry data in a scalable and fault-tolerant way. Visualize your data with built-in or custom widgets and flexible dashboards. Share dashboards with your customers. Define data processing rule chains. Transform and normalize your device data. Raise alarms on incoming telemetry events, attribute updates, device inactivity, and user actions.

Downloads: 8 This Week

Last Update: 2026-05-27
See Project
18

The Grand Complete Data Science Guide

Data Science Guide With Videos And Materials

The Grand Complete Data Science Materials is a repository curated by a data-science educator that aggregates a wide range of learning resources — from basic programming and math foundation to advanced topics in machine learning, deep learning, natural language processing, computer vision, and deployment practices — into a structured, centralized collection aimed at learners seeking a comprehensive path to data science mastery.

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
19

cobalt

Video and media downloader: Best way to save what you love

Cobalt is an open-source media downloader and tool designed to provide a high-performance and privacy-focused alternative for interacting with online media content, particularly focused on downloading and processing media from various platforms. It emphasizes speed, reliability, and a clean user experience, allowing users to retrieve media without unnecessary tracking, ads, or intrusive elements commonly found in web-based tools. The project is built with performance in mind, leveraging efficient backend processing to handle requests quickly and consistently. ...

Downloads: 64 This Week

Last Update: 2026-04-06
See Project
20

GLM

OpenGL Mathematics (GLM)

...This project isn't limited to GLSL features. An extension system, based on the GLSL extension conventions, provides extended capabilities: matrix transformations, quaternions, data packing, random numbers, noise, etc. This library works perfectly with OpenGL but it also ensures interoperability with other third party libraries and SDK. It is a good candidate for software rendering (raytracing / rasterisation), image processing, physics simulations and any development context that requires a simple and convenient mathematics library. ...

Downloads: 70 This Week

Last Update: 2025-12-31
See Project
21

Pathway

Python ETL framework for stream processing, real-time analytics, LLM

...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.

Downloads: 0 This Week

Last Update: 2026-06-12
See Project
22

Awesome Fraud Detection Research Papers

A curated list of data mining papers about fraud detection

A curated list of data mining papers about fraud detection from several conferences.

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
23

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 0 This Week

Last Update: 2025-06-09
See Project
24

Pachyderm

Data-Centric Pipelines and Data Versioning

...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.

Downloads: 0 This Week

Last Update: 2025-01-15
See Project
25

Siddhi Core Libraries

Stream Processing and Complex Event Processing Engine

Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...

Downloads: 0 This Week

Last Update: 2025-03-05
See Project