data processing free download

790 projects for "data processing" with 1 filter applied:

BSD Clear Filters & Widen Search

Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

Synthetic Data Generator

SDG is a specialized framework

...It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
2

NYC Taxi Data

Import public NYC taxi and for-hire vehicle (Uber, Lyft)

The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...

Downloads: 3 This Week

Last Update: 2025-10-01
See Project
3

Agentic Data Scientist

An end-to-end Data Scientist

...Each agent is designed to independently call functions, interact with data sources, and adapt to uncertainties during processing, enabling iterative refinement of models without manual coordination. The framework supports interoperability with existing data tools and libraries, letting the agents leverage libraries like pandas, scikit-learn, and visualization frameworks to perform real computations rather than mock demonstrations.

Downloads: 1 This Week

Last Update: 2026-02-05
See Project
4

Kapacitor

Open source framework for processing, monitoring, and alerting

Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.

Downloads: 2 This Week

Last Update: 7 days ago
See Project
AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
5

Broadway

Concurrent and multi-stage data ingestion and data processing

Broadway is a data processing library for Elixir designed to handle high-throughput, concurrent workloads with ease. It provides an abstraction for defining pipelines that consume data from sources like RabbitMQ, Kafka, Amazon SQS, or custom producers. Each pipeline is fault-tolerant and backpressure-aware, ensuring stable throughput even under load.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
6

jq

Lightweight and flexible command-line JSON processor

jq is like sed for JSON data - you can use it to slice, filter, map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. jq is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine of the same type, and expect it to work. jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler...

Downloads: 97 This Week

Last Update: 2025-07-01
See Project
7

Apache Spark

A unified analytics engine for large-scale data processing

Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. ...

Downloads: 10 This Week

Last Update: 2026-04-06
See Project
8

Easy3D

Efficient library for processing 3D data

Easy3D is a lightweight, easy-to-use, and efficient library for processing and rendering 3D data, implemented in C++ with Python bindings. It is designed for tasks such as 3D modeling, geometry processing, and rendering, emphasizing simplicity and efficiency. Easy3D serves as a valuable tool for research, education, and the development of sophisticated 3D applications, providing a solid foundation for handling 3D data.

Downloads: 6 This Week

Last Update: 2025-03-20
See Project
9

fluentbit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing....

Downloads: 12 This Week

Last Update: 2026-04-14
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Siddhi Core Libraries

Stream Processing and Complex Event Processing Engine

Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...

Downloads: 0 This Week

Last Update: 2025-03-05
See Project
11

LOTUS

AI-Powered Data Processing: Use LOTUS to process all of your datasets

LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. ...

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
12

The Grand Complete Data Science Guide

Data Science Guide With Videos And Materials

The Grand Complete Data Science Materials is a repository curated by a data-science educator that aggregates a wide range of learning resources — from basic programming and math foundation to advanced topics in machine learning, deep learning, natural language processing, computer vision, and deployment practices — into a structured, centralized collection aimed at learners seeking a comprehensive path to data science mastery.

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
13

TeXworks

A simple interface for working with TeX documents

TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.

1 Review

Downloads: 93 This Week

Last Update: 2026-02-11
See Project
14

cobalt

Video and media downloader: Best way to save what you love

Cobalt is an open-source media downloader and tool designed to provide a high-performance and privacy-focused alternative for interacting with online media content, particularly focused on downloading and processing media from various platforms. It emphasizes speed, reliability, and a clean user experience, allowing users to retrieve media without unnecessary tracking, ads, or intrusive elements commonly found in web-based tools. The project is built with performance in mind, leveraging efficient backend processing to handle requests quickly and consistently. ...

Downloads: 17 This Week

Last Update: 2026-04-06
See Project
15

lxml

The lxml XML toolkit for Python

A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...

Downloads: 16 This Week

Last Update: 5 days ago
See Project
16

DocETL

A system for agentic LLM-powered data processing and ETL

DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
17

python-small-examples

Focus on creating classic Python small examples and cases

python-small-examples is an open-source educational repository that contains hundreds of concise Python programming examples designed to illustrate practical coding techniques. The project focuses on teaching programming concepts through small, focused scripts that demonstrate common tasks in data processing, visualization, and general programming. Each example highlights a specific function or programming pattern so that learners can quickly understand how to apply Python features in real-world scenarios. The repository includes examples covering topics such as file processing, JSON manipulation, data visualization, and library usage. ...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
18

Acl

A powerful server and network library, including coroutine

The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...

Downloads: 6 This Week

Last Update: 2026-03-09
See Project
19

E2M

E2M converts various file types (doc, docx, epub, html, htm, url

E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while preserving the development history and release artifacts. Systems like e2m often serve as middleware components that connect different software systems or facilitate data processing pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
20

OmniTools

Self-hosted collection of powerful web-based tools for everyday tasks

...It’s designed to replace the random assortment of “free online tools” people use for quick tasks, while avoiding ads, tracking, and the need to upload sensitive files to unknown servers. A key design choice is that file processing happens entirely on the client side, meaning your data stays in your browser instead of being sent to the backend. The tool catalog spans both technical and non-technical needs, including image, video, audio, PDF, text, date/time, math, and data format utilities like JSON/CSV/XML helpers. It’s also packaged for straightforward self-hosting, with a lightweight Docker image and simple run commands, so it can be deployed quickly on a homelab or internal network.

Downloads: 9 This Week

Last Update: 2026-01-27
See Project
21

Apache Flink

Stream processing framework with powerful stream

Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
22

WebP Codec

Library to encode and decode images in WebP format

libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers,...

Downloads: 30 This Week

Last Update: 2025-10-12
See Project
23

Apache Sedona

Cluster computing framework for processing large-scale geospatial data

Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
24

spider_collection

Collection of Python web scraping scripts for data extraction tasks

...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.

Downloads: 3 This Week

Last Update: 7 days ago
See Project
25

ESPnet

End-to-end speech processing toolkit

ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.

Downloads: 1 This Week

Last Update: 1 day ago
See Project