Showing 39 open source projects for "data processing"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Goose Swift

    Goose Swift

    Goose Swift proof-of-concept README

    ...It is currently an alpha proof of concept intended for developers who can build the Swift app and Rust core themselves. The app connects to WHOOP 5.0 bands through Bluetooth and routes device data through a local Rust processing layer. It turns available signals into health, recovery, sleep, strain, stress, cardio, energy, coach, and debug views. The project includes a SwiftUI app, a Rust bridge, HealthKit support, a workout Live Activity extension, and internal documentation for its MVP pipeline. It is best understood as an experimental, independent wearable-data project rather than a finished consumer health tracker.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Arnis

    Arnis

    Generate any location from the real world in Minecraft

    ...The tool handles large-scale geospatial processing and transforms raw mapping data into a format compatible with Minecraft world generation. Users can generate entire regions, including detailed urban layouts and natural terrain, making it useful for education, visualization, or creative world-building projects.
    Downloads: 152 This Week
    Last Update:
    See Project
  • One App to Replace Your Entire SaaS Stack Icon
    One App to Replace Your Entire SaaS Stack

    Projects, docs, chat, and AI in one workspace. Work faster, not across 10 tabs.

    ClickUp replaces your scattered tool stack with one AI-powered platform. Stop paying for project management, docs, chat, and time tracking separately when they all live in one place. Teams that consolidate into ClickUp cut software costs and move faster because everything is connected, not siloed across apps that don't talk to each other.
    Try ClickUp Free
  • 5
    RuVector

    RuVector

    Self-Learning, Vector Graph Neural Network, and Database built in Rust

    RuVector is part of the broader rUv ecosystem of AI engineering tools and focuses on enabling advanced vector-based processing and intelligent system development within agentic and AI-driven pipelines. The project fits into a larger vision of modular, composable AI infrastructure designed to support autonomous agents, data retrieval, and intelligent automation workflows. It emphasizes extensibility and interoperability with modern AI stacks, allowing developers to integrate vector operations into search, reasoning, or generative systems. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    GreptimeDB

    GreptimeDB

    An open-source, cloud-native, unified time series database for metrics

    GreptimeDB treats all time series as contextual events with timestamps, and thus unifies the processing of metrics, logs, and events. It supports analyzing metrics, logs, and events with SQL, PromQL, and streaming with continuous aggregation. GreptimeDB is a time-series database optimized for storing and querying large amounts of time-series data, commonly used in monitoring and IoT applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    Databend is an open source cloud-native data warehouse designed for large-scale analytics and modern data workloads. Built in Rust, the system focuses on high performance, scalability, and efficient data processing for analytical queries. It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    QSV

    QSV

    Blazing-fast Data-Wrangling toolkit

    qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Atera - an All-in-one platform for IT management Icon
    Atera - an All-in-one platform for IT management

    Ideal for IT departments and MSPs (managed service providers)

    Your IT essentials, integrated & elevated. Take your IT management from automated to autonomous, download Atera's agent to start your free trial!
    Try Atera now
  • 10
    Meetily

    Meetily

    Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

    This project is a privacy-first AI meeting assistant that captures meeting audio, produces real-time transcripts, and generates summaries while keeping processing entirely on your own machine or infrastructure. It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 11
    Edgee

    Edgee

    AI gateway with token compression for Claude Code, Codex, and more

    Edgee is an edge-native execution platform designed to run AI-driven logic and data processing directly at the network edge, reducing latency and improving responsiveness for modern applications. It enables developers to deploy functions and workflows closer to users, allowing real-time processing without relying heavily on centralized cloud infrastructure. The platform is built to support event-driven architectures, where actions are triggered by incoming requests, user behavior, or external signals. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Lantern Database

    Lantern Database

    PostgreSQL vector database extension for building AI applications

    Lantern is a real-time data transformation engine that enables data engineers to build, run, and monitor streaming data pipelines with SQL. It’s designed to process events in motion, offering low-latency stream transformations, aggregations, and enrichment in a declarative way. Lantern is especially suited for modern data infrastructure and analytics platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Sail

    Sail

    A drop-in Apache Spark replacement written in Rust

    Sail is an open-source distributed computation framework designed to unify batch processing, stream processing, and AI workloads into a single, high-performance engine. It is built entirely in Rust, eliminating JVM overhead and enabling predictable performance, fast startup times, and improved memory safety compared to traditional big data frameworks. Sail is compatible with the Spark Connect protocol, which means existing Spark SQL and DataFrame workloads can run without code changes, making adoption seamless for teams already using Spark-based pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    jaq

    jaq

    A jq clone focussed on correctness, speed, and simplicity

    jaq (pronounced like Jacques) is a clone of the JSON data processing tool jq. jaq aims to support a large subset of jq's syntax and operations. Jaq aims to provide a more correct and predictable implementation of jq, while preserving compatibility with jq in most cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Gyroflow

    Gyroflow

    Video stabilization using gyroscope data

    Gyroflow is an advanced open-source video stabilization application that uses gyroscope and motion sensor data to produce highly accurate and cinematic stabilization results. Instead of relying solely on visual estimation like traditional software stabilizers, it processes real motion data recorded by cameras or external sensors to achieve more precise compensation. This approach allows it to correct complex camera movement, rolling shutter distortion, and lens artifacts while preserving...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    Vector

    Vector

    A high-performance observability data pipeline

    Vector is a Rust‑based, high‑performance observability data pipeline tool (agent + aggregator) designed to collect, transform, and route logs and metrics at scale. Created by Datadog, it aims to be the only tool needed from ingestion to vendor output, providing cost-efficient, safe, and flexible telemetry processing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Lingua-RS

    Lingua-RS

    The most accurate natural language detection library for Rust

    Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Arkflow

    Arkflow

    High performance Rust stream processing engine

    Arkflow is a Rust-based framework for building reactive, event-driven data pipelines. Inspired by tools like Airflow and Dagster, it focuses on strong typing, modularity, and performance. Arkflow is ideal for developers who want a fast, extensible way to orchestrate workflows and data transformations in Rust.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ReductStore

    ReductStore

    The fastest time series object store for Edge AI

    History storage and management of images, vibration data, text, labels, and more - all in one place with the highest performance. Merge blob and time series functionalities, reducing the need for multiple databases. Customize real-time data retention policies and replication strategies. Store billions of time-stamped blobs with AI labels and access them with low latency. Outperform other databases with a customized solution for time-series object data. Capture and access blob data as time...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    wx-cli

    wx-cli

    WeChat local data CLI with daemon architecture

    ...It is designed to be AI-agent friendly, with YAML output by default and optional JSON output for automation or downstream processing. The project keeps data local, decrypts in real time, and avoids full pre-decryption workflows. It is useful for users who need searchable, scriptable access to their own WeChat records while preserving local control over the data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    IronCalc

    IronCalc

    Main engine of the IronCalc ecosystem

    IronCalc is a new, modern, work-in-progress spreadsheet engine and set of tools to work with spreadsheets in diverse settings. IronCalc is a lightweight, open-source computational engine designed for performing mathematical operations, formula calculations, and data-driven tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    dovi_tool

    dovi_tool

    dovi_tool is a CLI tool combining multiple utilities

    dovi_tool is a command-line utility written in Rust that provides a comprehensive set of tools for working with Dolby Vision metadata in video streams. It is designed to analyze, edit, and generate dynamic metadata used in high dynamic range video formats. The tool allows users to extract, inject, and modify RPU data, which controls how Dolby Vision content is displayed on compatible devices. It also supports demuxing and muxing HEVC streams, enabling manipulation of enhancement layers and metadata within video files. dovi_tool is widely used in advanced video encoding workflows where precise control over HDR metadata is required. Its modular design includes multiple subcommands that handle different aspects of processing, from inspection to transformation. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 23
    Biome

    Biome

    A toolchain for web projects, aimed to provide functionalities

    Biome formats and lints your code in a fraction of a second. Biome supports JavaScript, TypeScript, JSON, and CSS. It aims to support all main languages of modern web development. Biome has sane defaults and requires minimal configuration. Biome helps you as much as possible by displaying detailed and contextualized diagnostics. Biome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    PostgresML

    PostgresML

    The GPU-powered AI application database

    ...Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization with embeddings to improve search results. Leverage your data with time series forecasting to garner key business insights. Build statistical and predictive models with the full power of SQL and dozens of regression algorithms. Return results and detect fraud faster with ML at the database layer. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Polars

    Polars

    Dataframes powered by a multithreaded, vectorized query engine

    Polars is a high-performance, multi-language DataFrame library built in Rust using Apache Arrow. It delivers blazing-fast, vectorized, and parallel data manipulation with both eager and lazy execution, making it an excellent tool for data processing in Python, Rust, Node.js, R, and SQL contexts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo