Showing 22 open source projects for "jpk data processing"

View related business solutions
  • Cut Cloud Costs with Google Compute Engine Icon
    Cut Cloud Costs with Google Compute Engine

    Save up to 91% with Spot VMs and get automatic sustained-use discounts. One free VM per month, plus $300 in credits.

    Save on compute costs with Compute Engine. Reduce your batch jobs and workload bill 60-91% with Spot VMs. Compute Engine's committed use offers customers up to 70% savings through sustained use discounts. Plus, you get one free e2-micro VM monthly and $300 credit to start.
    Try Compute Engine
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • 1
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    QSV

    QSV

    Blazing-fast Data-Wrangling toolkit

    qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 3
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 15 This Week
    Last Update:
    See Project
  • Deploy Apps in Seconds with Cloud Run Icon
    Deploy Apps in Seconds with Cloud Run

    Host and run your applications without the need to manage infrastructure. Scales up from and down to zero automatically.

    Cloud Run is the fastest way to deploy containerized apps. Push your code in Go, Python, Node.js, Java, or any language and Cloud Run builds and deploys it automatically. Get fast autoscaling, pay only when your code runs, and skip the infrastructure headaches. Two million requests free per month. And new customers get $300 in free credit.
    Try Cloud Run Free
  • 5
    Meetily

    Meetily

    Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

    This project is a privacy-first AI meeting assistant that captures meeting audio, produces real-time transcripts, and generates summaries while keeping processing entirely on your own machine or infrastructure. It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 6
    pg_analytics

    pg_analytics

    DuckDB-powered analytics for Postgres

    pg_analytics (formerly named pg_lakehouse) puts DuckDB inside Postgres. With pg_analytics installed, Postgres can query foreign object stores like AWS S3 and table formats like Iceberg or Delta Lake. Queries are pushed down to DuckDB, a high-performance analytical query engine. By transforming Postgres into a performant search and analytics engine, ParadeDB frees your team from the pain of scaling and syncing Elasticsearch.
    Downloads: 59 This Week
    Last Update:
    See Project
  • 7
    Lantern Database

    Lantern Database

    PostgreSQL vector database extension for building AI applications

    Lantern is a real-time data transformation engine that enables data engineers to build, run, and monitor streaming data pipelines with SQL. It’s designed to process events in motion, offering low-latency stream transformations, aggregations, and enrichment in a declarative way. Lantern is especially suited for modern data infrastructure and analytics platforms.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Luminal

    Luminal

    Deep learning at the speed of light

    Luminal is a framework designed to accelerate and simplify the development of systems-level data applications by using a typed, functional, and streaming-first approach. Instead of treating data processing as a series of ad-hoc scripts, Luminal models transformations as strongly typed building blocks that can be composed into reliable, scalable pipelines. The project emphasizes correctness and performance by requiring explicit types for the data flowing through transformations, reducing runtime surprises and allowing for highly optimized execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GreptimeDB

    GreptimeDB

    An open-source, cloud-native, unified time series database for metrics

    GreptimeDB treats all time series as contextual events with timestamps, and thus unifies the processing of metrics, logs, and events. It supports analyzing metrics, logs, and events with SQL, PromQL, and streaming with continuous aggregation. GreptimeDB is a time-series database optimized for storing and querying large amounts of time-series data, commonly used in monitoring and IoT applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10
    PostgresML

    PostgresML

    The GPU-powered AI application database

    ...Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization with embeddings to improve search results. Leverage your data with time series forecasting to garner key business insights. Build statistical and predictive models with the full power of SQL and dozens of regression algorithms. Return results and detect fraud faster with ML at the database layer. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    ReductStore

    ReductStore

    The fastest time series object store for Edge AI

    History storage and management of images, vibration data, text, labels, and more - all in one place with the highest performance. Merge blob and time series functionalities, reducing the need for multiple databases. Customize real-time data retention policies and replication strategies. Store billions of time-stamped blobs with AI labels and access them with low latency. Outperform other databases with a customized solution for time-series object data. Capture and access blob data as time...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    IronCalc

    IronCalc

    Main engine of the IronCalc ecosystem

    IronCalc is a new, modern, work-in-progress spreadsheet engine and set of tools to work with spreadsheets in diverse settings. IronCalc is a lightweight, open-source computational engine designed for performing mathematical operations, formula calculations, and data-driven tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    jaq

    jaq

    A jq clone focussed on correctness, speed, and simplicity

    jaq (pronounced like Jacques) is a clone of the JSON data processing tool jq. jaq aims to support a large subset of jq's syntax and operations. Jaq aims to provide a more correct and predictable implementation of jq, while preserving compatibility with jq in most cases.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Polars

    Polars

    Dataframes powered by a multithreaded, vectorized query engine

    Polars is a high-performance, multi-language DataFrame library built in Rust using Apache Arrow. It delivers blazing-fast, vectorized, and parallel data manipulation with both eager and lazy execution, making it an excellent tool for data processing in Python, Rust, Node.js, R, and SQL contexts.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Lingua-RS

    Lingua-RS

    The most accurate natural language detection library for Rust

    Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Arkflow

    Arkflow

    High performance Rust stream processing engine

    Arkflow is a Rust-based framework for building reactive, event-driven data pipelines. Inspired by tools like Airflow and Dagster, it focuses on strong typing, modularity, and performance. Arkflow is ideal for developers who want a fast, extensible way to orchestrate workflows and data transformations in Rust.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Biome

    Biome

    A toolchain for web projects, aimed to provide functionalities

    Biome formats and lints your code in a fraction of a second. Biome supports JavaScript, TypeScript, JSON, and CSS. It aims to support all main languages of modern web development. Biome has sane defaults and requires minimal configuration. Biome helps you as much as possible by displaying detailed and contextualized diagnostics. Biome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Vector

    Vector

    A high-performance observability data pipeline

    Vector is a Rust‑based, high‑performance observability data pipeline tool (agent + aggregator) designed to collect, transform, and route logs and metrics at scale. Created by Datadog, it aims to be the only tool needed from ingestion to vendor output, providing cost-efficient, safe, and flexible telemetry processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CloudI: A Cloud at the lowest level
    CloudI is an open-source private cloud computing framework for efficient, secure, and internal data processing. CloudI provides scaling for previously unscalable source code with efficient fault-tolerant execution of ATS, C/C++, Erlang/Elixir, Go, Haskell, Java, JavaScript/node.js, OCaml, Perl, PHP, Python, Ruby, or Rust services. The bare essentials for efficient fault-tolerant processing on a cloud!
    Downloads: 16 This Week
    Last Update:
    See Project
  • 20
    Barter

    Barter

    Open-source Rust framework for building event-driven systems

    ...Use mock MarketStream or Execution components to enable back-testing on a near-identical trading system as live-trading. Centralised cache-friendly state management system with O(1) constant lookups using indexed data structures. Robust Order management system - use stand-alone or with Barter. Turn on/off algorithmic trading from an external process (eg/ UI, Telegram, etc.) whilst still processing market/account data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Amadeus

    Amadeus

    Harmonious distributed data analysis in Rust

    Amadeus is a high-performance, distributed data processing framework written in Rust, designed to offer an ergonomic and safe alternative to tools like Apache Spark. It provides both streaming and batch capabilities, allowing users to work with real-time and historical data at scale. Thanks to Rust’s memory safety and zero-cost abstractions, Amadeus delivers performance gains while reducing the complexity and bugs common in large-scale data pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Xi Editor

    Xi Editor

    A modern editor with a backend written in Rust

    ...Its architecture splits a thin UI layer from a high-performance core engine (written in Rust) that handles buffer editing, syntax highlighting, undo/redo, searching, and background processing asynchronously. This separation lets the core focus on speed and concurrency, while multiple frontends (macOS, GTK, web) handle rendering, input, and platform bridging. Xi is designed to scale to huge files (gigabyte scale) without lag, thanks to a rope data structure and efficient diffing algorithms. Plugins communicate with the core via JSON-RPC, enabling asynchronous extensions without blocking editing responsiveness. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.