data processing free download

Showing 39 open source projects for "data processing"

View related business solutions

Rust Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Arroyo

Distributed stream processing engine in Rust

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
2

Goose Swift

Goose Swift proof-of-concept README

...It is currently an alpha proof of concept intended for developers who can build the Swift app and Rust core themselves. The app connects to WHOOP 5.0 bands through Bluetooth and routes device data through a local Rust processing layer. It turns available signals into health, recovery, sleep, strain, stress, cardio, energy, coach, and debug views. The project includes a SwiftUI app, a Rust bridge, HealthKit support, a workout Live Activity extension, and internal documentation for its MVP pipeline. It is best understood as an experimental, independent wearable-data project rather than a finished consumer health tracker.

Downloads: 15 This Week

Last Update: 6 days ago
See Project
3

CocoIndex

ETL framework to index data for AI, such as RAG

CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
4

Arnis

Generate any location from the real world in Minecraft

...The tool handles large-scale geospatial processing and transforms raw mapping data into a format compatible with Minecraft world generation. Users can generate entire regions, including detailed urban layouts and natural terrain, making it useful for education, visualization, or creative world-building projects.

Downloads: 152 This Week

Last Update: 2026-06-16
See Project
One App to Replace Your Entire SaaS Stack
Projects, docs, chat, and AI in one workspace. Work faster, not across 10 tabs.

ClickUp replaces your scattered tool stack with one AI-powered platform. Stop paying for project management, docs, chat, and time tracking separately when they all live in one place. Teams that consolidate into ClickUp cut software costs and move faster because everything is connected, not siloed across apps that don't talk to each other.

Try ClickUp Free
5

RuVector

Self-Learning, Vector Graph Neural Network, and Database built in Rust

RuVector is part of the broader rUv ecosystem of AI engineering tools and focuses on enabling advanced vector-based processing and intelligent system development within agentic and AI-driven pipelines. The project fits into a larger vision of modular, composable AI infrastructure designed to support autonomous agents, data retrieval, and intelligent automation workflows. It emphasizes extensibility and interoperability with modern AI stacks, allowing developers to integrate vector operations into search, reasoning, or generative systems. ...

Downloads: 16 This Week

Last Update: 6 days ago
See Project
6

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...

Downloads: 13 This Week

Last Update: 15 hours ago
See Project
7

GreptimeDB

An open-source, cloud-native, unified time series database for metrics

GreptimeDB treats all time series as contextual events with timestamps, and thus unifies the processing of metrics, logs, and events. It supports analyzing metrics, logs, and events with SQL, PromQL, and streaming with continuous aggregation. GreptimeDB is a time-series database optimized for storing and querying large amounts of time-series data, commonly used in monitoring and IoT applications.

Downloads: 3 This Week

Last Update: 5 days ago
See Project
8

Databend

Cloud-native open source data warehouse for analytics and AI queries

Databend is an open source cloud-native data warehouse designed for large-scale analytics and modern data workloads. Built in Rust, the system focuses on high performance, scalability, and efficient data processing for analytical queries. It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems.

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
9

QSV

Blazing-fast Data-Wrangling toolkit

qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.

Downloads: 3 This Week

Last Update: 2026-06-15
See Project
Atera - an All-in-one platform for IT management
Ideal for IT departments and MSPs (managed service providers)

Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!

Try Atera now
10

Meetily

Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

This project is a privacy-first AI meeting assistant that captures meeting audio, produces real-time transcripts, and generates summaries while keeping processing entirely on your own machine or infrastructure. It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. ...

Downloads: 12 This Week

Last Update: 2026-06-05
See Project
11

Edgee

AI gateway with token compression for Claude Code, Codex, and more

Edgee is an edge-native execution platform designed to run AI-driven logic and data processing directly at the network edge, reducing latency and improving responsiveness for modern applications. It enables developers to deploy functions and workflows closer to users, allowing real-time processing without relying heavily on centralized cloud infrastructure. The platform is built to support event-driven architectures, where actions are triggered by incoming requests, user behavior, or external signals. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
12

Lantern Database

PostgreSQL vector database extension for building AI applications

Lantern is a real-time data transformation engine that enables data engineers to build, run, and monitor streaming data pipelines with SQL. It’s designed to process events in motion, offering low-latency stream transformations, aggregations, and enrichment in a declarative way. Lantern is especially suited for modern data infrastructure and analytics platforms.

Downloads: 0 This Week

Last Update: 2025-06-12
See Project
13

Sail

A drop-in Apache Spark replacement written in Rust

Sail is an open-source distributed computation framework designed to unify batch processing, stream processing, and AI workloads into a single, high-performance engine. It is built entirely in Rust, eliminating JVM overhead and enabling predictable performance, fast startup times, and improved memory safety compared to traditional big data frameworks. Sail is compatible with the Spark Connect protocol, which means existing Spark SQL and DataFrame workloads can run without code changes, making adoption seamless for teams already using Spark-based pipelines. ...

Downloads: 0 This Week

Last Update: 2026-06-06
See Project
14

jaq

A jq clone focussed on correctness, speed, and simplicity

jaq (pronounced like Jacques) is a clone of the JSON data processing tool jq. jaq aims to support a large subset of jq's syntax and operations. Jaq aims to provide a more correct and predictable implementation of jq, while preserving compatibility with jq in most cases.

Downloads: 0 This Week

Last Update: 2026-06-11
See Project
15

Gyroflow

Video stabilization using gyroscope data

Gyroflow is an advanced open-source video stabilization application that uses gyroscope and motion sensor data to produce highly accurate and cinematic stabilization results. Instead of relying solely on visual estimation like traditional software stabilizers, it processes real motion data recorded by cameras or external sensors to achieve more precise compensation. This approach allows it to correct complex camera movement, rolling shutter distortion, and lens artifacts while preserving...

Downloads: 8 This Week

Last Update: 2026-04-23
See Project
16

Vector

A high-performance observability data pipeline

Vector is a Rust‑based, high‑performance observability data pipeline tool (agent + aggregator) designed to collect, transform, and route logs and metrics at scale. Created by Datadog, it aims to be the only tool needed from ingestion to vendor output, providing cost-efficient, safe, and flexible telemetry processing.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
17

Lingua-RS

The most accurate natural language detection library for Rust

Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
18

Arkflow

High performance Rust stream processing engine

Arkflow is a Rust-based framework for building reactive, event-driven data pipelines. Inspired by tools like Airflow and Dagster, it focuses on strong typing, modularity, and performance. Arkflow is ideal for developers who want a fast, extensible way to orchestrate workflows and data transformations in Rust.

Downloads: 0 This Week

Last Update: 2025-10-19
See Project
19

ReductStore

The fastest time series object store for Edge AI

History storage and management of images, vibration data, text, labels, and more - all in one place with the highest performance. Merge blob and time series functionalities, reducing the need for multiple databases. Customize real-time data retention policies and replication strategies. Store billions of time-stamped blobs with AI labels and access them with low latency. Outperform other databases with a customized solution for time-series object data. Capture and access blob data as time...

Downloads: 1 This Week

Last Update: 4 hours ago
See Project
20

wx-cli

WeChat local data CLI with daemon architecture

...It is designed to be AI-agent friendly, with YAML output by default and optional JSON output for automation or downstream processing. The project keeps data local, decrypts in real time, and avoids full pre-decryption workflows. It is useful for users who need searchable, scriptable access to their own WeChat records while preserving local control over the data.

Downloads: 0 This Week

Last Update: 2026-05-15
See Project
21

IronCalc

Main engine of the IronCalc ecosystem

IronCalc is a new, modern, work-in-progress spreadsheet engine and set of tools to work with spreadsheets in diverse settings. IronCalc is a lightweight, open-source computational engine designed for performing mathematical operations, formula calculations, and data-driven tasks.

Downloads: 0 This Week

Last Update: 2026-01-25
See Project
22

dovi_tool

dovi_tool is a CLI tool combining multiple utilities

dovi_tool is a command-line utility written in Rust that provides a comprehensive set of tools for working with Dolby Vision metadata in video streams. It is designed to analyze, edit, and generate dynamic metadata used in high dynamic range video formats. The tool allows users to extract, inject, and modify RPU data, which controls how Dolby Vision content is displayed on compatible devices. It also supports demuxing and muxing HEVC streams, enabling manipulation of enhancement layers and metadata within video files. dovi_tool is widely used in advanced video encoding workflows where precise control over HDR metadata is required. Its modular design includes multiple subcommands that handle different aspects of processing, from inspection to transformation. ...

Downloads: 26 This Week

Last Update: 2026-04-29
See Project
23

Biome

A toolchain for web projects, aimed to provide functionalities

Biome formats and lints your code in a fraction of a second. Biome supports JavaScript, TypeScript, JSON, and CSS. It aims to support all main languages of modern web development. Biome has sane defaults and requires minimal configuration. Biome helps you as much as possible by displaying detailed and contextualized diagnostics. Biome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code,...

Downloads: 1 This Week

Last Update: 9 hours ago
See Project
24

PostgresML

The GPU-powered AI application database

...Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization with embeddings to improve search results. Leverage your data with time series forecasting to garner key business insights. Build statistical and predictive models with the full power of SQL and dozens of regression algorithms. Return results and detect fraud faster with ML at the database layer. ...

Downloads: 0 This Week

Last Update: 2025-01-16
See Project
25

Polars

Dataframes powered by a multithreaded, vectorized query engine

Polars is a high-performance, multi-language DataFrame library built in Rust using Apache Arrow. It delivers blazing-fast, vectorized, and parallel data manipulation with both eager and lazy execution, making it an excellent tool for data processing in Python, Rust, Node.js, R, and SQL contexts.

Downloads: 0 This Week

Last Update: 2026-06-04
See Project