Showing 351 open source projects for "data processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    Tesla

    Tesla

    The flexible HTTP client library for Elixir

    The flexible HTTP client library for Elixir, with support for middleware and multiple adapters. Tesla is an HTTP client loosely based on Faraday. It embraces the concept of middleware when processing the request/response cycle. Define module with use Tesla and choose from a variety of middleware. Tesla is built around the concept of composable middlewares. This is very similar to how Plug Router works. All HTTP functions, such as Tesla.get/3 and Tesla.post/4, can take a dynamic client as the...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Watermill

    Watermill

    Building event-driven applications the easy way in Go

    Go library for building event-driven applications. Our goal was to create a tool that is easy to understand, even by junior developers. It doesn't matter if you want to do Event-driven architecture, CQRS, Event Sourcing or just stream MySQL Binlog to Kafka. Watermill was designed to process hundreds of thousands of messages per second. Every component is built in a way that allows you to configure it for your needs. You can also implement your own middleware for the router. Watermill is...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    sharp

    sharp

    High performance Node.js image processing module

    The typical use case for this high speed Node.js module is to convert large images in common formats to smaller, web-friendly JPEG, PNG, AVIF and WebP images of varying dimensions. Resizing an image is typically 4x-5x faster than using the quickest ImageMagick and GraphicsMagick settings due to its use of libvips. Colour spaces, embedded ICC profiles and alpha transparency channels are all handled correctly. Lanczos resampling ensures quality is not sacrificed for speed. As well as image...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    WebP Codec

    WebP Codec

    Library to encode and decode images in WebP format

    libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers,...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 6
    Spring Batch

    Spring Batch

    Spring Batch is a framework for writing batch applications using Java

    A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    RuVector

    RuVector

    Self-Learning, Vector Graph Neural Network, and Database built in Rust

    RuVector is part of the broader rUv ecosystem of AI engineering tools and focuses on enabling advanced vector-based processing and intelligent system development within agentic and AI-driven pipelines. The project fits into a larger vision of modular, composable AI infrastructure designed to support autonomous agents, data retrieval, and intelligent automation workflows. It emphasizes extensibility and interoperability with modern AI stacks, allowing developers to integrate vector operations into search, reasoning, or generative systems. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    Apache Sedona

    Apache Sedona

    Cluster computing framework for processing large-scale geospatial data

    Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Lexbor

    Lexbor

    Lexbor is development of an open source HTML Renderer library

    Lexbor is the development of a web browser engine available as a software library; it ships with a free license and has no extra dependencies. For us, speed is an absolute must-have. In our development process, we focus on fastest parsing techniques for HTML, CSS, and fonts, fastest data processing methods, and fastest ways to serve content to end users. Whether you are building a backend that handles millions of HTML documents or a UI-heavy user app, your software’s response rate always matters to users and developers alike. Lexbor’s code is optimized for ease of access in end-user applications and across programming languages. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Matter AI

    Matter AI

    Matter AI is open-source AI Code Reviewer Agent

    Matter AI is an AI-powered platform designed to enhance productivity through automated content generation, data analysis, and decision support. It leverages machine learning models to process text, analyze patterns, and generate insights, making it suitable for businesses looking to optimize data-driven decision-making. Matter AI integrates with various data sources and provides customizable AI workflows tailored to different industries.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    Hacks is a collection of experimental scripts, utilities, and one-off tools created to solve specific problems in security research, data processing, and automation. Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    PHP Code Coverage

    PHP Code Coverage

    Collection, processing, and rendering functionality for PHP code

    The php-code-coverage library, authored by Sebastian Bergmann, enables collection, processing, and rendering of PHP code coverage data. It integrates with PHPUnit or other testing frameworks to track which lines, methods, or classes are executed during tests. The library supports generating detailed reports in formats like HTML, Clover, or XML, helping teams understand test completeness and identify untested code paths.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Apache Flink

    Apache Flink

    Stream processing framework with powerful stream

    Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    protoactor-go

    protoactor-go

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

    Built on cloud-native technologies. Taking advantage of proven stability and performance. Asynchronous and Distributed by design. High-level abstractions like Actors and Virtual Grains. Capable of millions of messages per second cross-process communication. Write systems that self-heal using supervisor hierarchies. The Actor Model provides a higher level of abstraction for writing concurrent and distributed systems. It alleviates the developer from having to deal with explicit locking and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    MathPHP

    MathPHP

    Powerful modern math library for PHP

    Math PHP is a library that brings advanced mathematical functions and data analysis capabilities to PHP applications. It covers a wide range of topics, including linear algebra, calculus, statistics, probability, and numerical analysis. Math PHP is designed for developers and data scientists who require precise and efficient mathematical computations in PHP, making it suitable for scientific computing and data processing.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    julep

    julep

    A new DSL and server for AI agents and multi-step tasks

    Julep is a platform for creating AI agents that remember past interactions and can perform complex tasks. It offers long-term memory and manages multi-step processes. Julep enables the creation of multi-step tasks incorporating decision-making, loops, parallel processing, and integration with numerous external tools and APIs. While many AI applications are limited to simple, linear chains of prompts and API calls with minimal branching, Julep is built to handle more complex scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Kotlin Dataframe

    Kotlin Dataframe

    Structured data processing in Kotlin

    Data frame is an abstraction for working with structured data. Essentially it’s a 2-dimensional table with labeled columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dictionary of series objects. The handiness of this abstraction is not in the table itself but in a set of operations defined on it. The Kotlin Dataframe library is an idiomatic Kotlin DSL defining such operations. The process of working with data frame is often called data...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    hash-wasm

    hash-wasm

    Lightning fast hash functions using hand-tuned WebAssembly binaries

    ...The library supports a wide range of algorithms, including MD5, SHA variants, BLAKE, Argon2, bcrypt, and xxHash, making it suitable for applications ranging from security to data processing. By compiling optimized C implementations into WebAssembly, hash-wasm achieves significantly better performance compared to pure JavaScript alternatives while maintaining portability across platforms. It supports both simple one-shot hashing and advanced streaming modes, allowing developers to process large datasets incrementally. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Smallpond

    Smallpond

    A lightweight data processing framework built on DuckDB and 3FS

    smallpond is a lightweight distributed data processing framework built by DeepSeek, designed to scale DuckDB workloads over clusters using their 3FS (Fire-Flyer File System) backend. The idea is to preserve DuckDB’s fast analytics engine but lift it from single-node to multi-node settings, giving you the ability to operate on large datasets (e.g. petabyte scale) without moving to a heavyweight system like Spark.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data. InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    Oban

    Oban

    Robust job processing in Elixir, backed by modern PostgreSQL

    ...It provides a simple and consistent API for scheduling and performing jobs, and it is built to be fault-tolerant and easy to monitor. Oban is fundamentally different from other background job processing tools because it retains job data for historic metrics and inspection. You can leave your application running indefinitely without worrying about jobs being lost or orphaned due to crashes.
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB