Showing 1483 open source projects for "jpk data processing"

View related business solutions
  • Go from Data Warehouse to Data and AI platform with BigQuery Icon
    Go from Data Warehouse to Data and AI platform with BigQuery

    Build, train, and run ML models with simple SQL. Automate data prep, analysis, and predictions with built-in AI assistance from Gemini.

    BigQuery is more than a data warehouse—it's an autonomous data-to-AI platform. Use familiar SQL to train ML models, run time-series forecasts, and generate AI-powered insights with native Gemini integration. Built-in agents handle data engineering and data science workflows automatically. Get $300 in free credit, query 1 TB, and store 10 GB free monthly.
    Try BigQuery Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Agentic Data Scientist

    Agentic Data Scientist

    An end-to-end Data Scientist

    ...Each agent is designed to independently call functions, interact with data sources, and adapt to uncertainties during processing, enabling iterative refinement of models without manual coordination. The framework supports interoperability with existing data tools and libraries, letting the agents leverage libraries like pandas, scikit-learn, and visualization frameworks to perform real computations rather than mock demonstrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Easily Host LLMs and Web Apps on Cloud Run Icon
    Easily Host LLMs and Web Apps on Cloud Run

    Run everything from popular models with on-demand NVIDIA L4 GPUs to web apps without infrastructure management.

    Run frontend and backend services, batch jobs, host LLMs, and queue processing workloads without the need to manage infrastructure. Cloud Run gives you on-demand GPU access for hosting LLMs and running real-time AI—with 5-second cold starts and automatic scale-to-zero so you only pay for actual usage. New customers get $300 in free credit to start.
    Try Cloud Run Free
  • 5
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    LAStools

    LAStools

    efficient tools for LiDAR processing

    LAStools is a collection of efficient, multi-core, scriptable tools for processing LiDAR data. It supports various formats, including LAS, LAZ, Terrasolid BIN, and ESRI Shapefiles, providing a comprehensive suite for LiDAR data management and analysis.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 7
    CyberChef

    CyberChef

    A web app for encryption, encoding, compression and data analysis

    CyberChef, developed by GCHQ, is a versatile web application dubbed the "Cyber Swiss Army Knife." It enables users to perform a wide array of operations on data, including encryption, encoding, compression, and analysis, all within a browser interface.​
    Downloads: 62 This Week
    Last Update:
    See Project
  • 8
    go-streams

    go-streams

    A lightweight stream processing library for Go

    A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Bytewax

    Bytewax

    Python Stream Processing

    ...Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.
    Downloads: 3 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 13
    MeshLab

    MeshLab

    The open source mesh processing system

    ...VCG can be used as a stand-alone large-scale automated mesh processing pipeline, while MeshLab makes it easy to experiment with its algorithms interactively. The open source system for processing and editing 3D triangular meshes. It provides a set of tools for editing, cleaning, healing, inspecting, rendering, texturing and converting meshes. It offers features for processing raw data produced by 3D digitization tools/devices and for preparing models for 3D printing.
    Downloads: 30 This Week
    Last Update:
    See Project
  • 14
    ExtractThinker

    ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs

    ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Airborne Data Processing and Analysis

    Airborne Data Processing and Analysis

    Software to processing and analyze of airborne measurements.

    The Airborne Data Processing and Analysis (ADPAA) package is an open-source software package containing a collection of programs and scripts to process and analyze data from in-situ instruments deployed on airborne platforms. The ADPAA package was started to process data on the North Dakota Citation Research Aircraft but has been used to process data on many airborne platforms.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 19
    ThingsBoard

    ThingsBoard

    Device management, data collection, processing and visualization

    ...Define relations between your devices, assets, customers or any other entities. Collect and store telemetry data in a scalable and fault-tolerant way. Visualize your data with built-in or custom widgets and flexible dashboards. Share dashboards with your customers. Define data processing rule chains. Transform and normalize your device data. Raise alarms on incoming telemetry events, attribute updates, device inactivity, and user actions.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    jq

    jq

    Lightweight and flexible command-line JSON processor

    jq is like sed for JSON data - you can use it to slice, filter, map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. jq is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine of the same type, and expect it to work. jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 21
    EEGLAB

    EEGLAB

    EEGLAB is an open source signal processing environment

    EEGLAB is an open source, MATLAB-based interactive environment for analyzing electrophysiological signals such as EEG and MEG. It incorporates powerful tools for data import, preprocessing, independent component analysis (ICA), time-frequency analysis, artifact rejection, and visualization—all within a GUI framework that also supports scripting and plugin extensions. EEGLAB is an open source signal processing environment for electrophysiological signals running on Matlab and Octave (command line only for Octave). ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into structured outputs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    ...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB
Gen AI apps are built with MongoDB Atlas
Atlas offers built-in vector search and global availability across 125+ regions. Start building AI apps faster, all in one place.