Showing 61 open source projects for "data processing"

View related business solutions
  • AI Agents That Actually Do the Work Icon
    AI Agents That Actually Do the Work

    Assign real work to AI teammates that know your projects, priorities, and deadlines.

    ClickUp's Super Agents run 24/7 inside your workspace: triaging bugs, drafting content, updating statuses, and routing tasks without being told twice. Connect them to 500+ tools and let them execute, not just suggest. Build custom agents in minutes that understand your workflows and act on them autonomously.
    Try ClickUp Free
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    go-streams

    go-streams

    A lightweight stream processing library for Go

    A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 5
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    InfluxDB

    InfluxDB

    The open source time series database

    ...Time series is currently the fastest growing database category there is, and InfluxDB is here to ensure businesses can keep up. InfluxDB provides infrastructure and application monitoring, IoT monitoring and analytics and more. It has APIs for storing and querying data, processing it in the background for ETL or monitoring and alerting purposes. This data can also be visualized, explored and more to help businesses seize opportunities and make the best decisions. InfluxDB is easy to start and easy to scale. Learn more about it on https://www.influxdata.com/
    Downloads: 28 This Week
    Last Update:
    See Project
  • 7
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    Hacks is a collection of experimental scripts, utilities, and one-off tools created to solve specific problems in security research, data processing, and automation. Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    SigLens

    SigLens

    100x Efficient Log Management than Splunk

    Siglens is an open-source signal analysis toolkit designed for processing and visualizing time-series data, commonly used in scientific and engineering applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Miller

    Miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data

    Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more. Miller operates on key-value-pair data while the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Benthos

    Benthos

    Fancy stream processing made operationally mundane

    Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Watermill

    Watermill

    Building event-driven applications the easy way in Go

    Go library for building event-driven applications. Our goal was to create a tool that is easy to understand, even by junior developers. It doesn't matter if you want to do Event-driven architecture, CQRS, Event Sourcing or just stream MySQL Binlog to Kafka. Watermill was designed to process hundreds of thousands of messages per second. Every component is built in a way that allows you to configure it for your needs. You can also implement your own middleware for the router. Watermill is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    paperless-gpt

    paperless-gpt

    Use LLMs and LLM Vision (OCR) to handle paperless-ngx

    paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    ojg

    ojg

    Optimized JSON for Go

    Optimized JSON for Go is a high-performance parser with a variety of additional JSON tools. OjG is optimized to processing huge data sets where data does not necessarily conform to a fixed structure.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Neuroglancer

    Neuroglancer

    WebGL-based viewer for volumetric data

    ...The viewer is built with a multi-threaded architecture, separating rendering and data processing to ensure smooth performance even with massive datasets. Extensively used in neuroscience research, Neuroglancer supports integration with tools.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Bitalosdb

    Bitalosdb

    Bitalosdb is a high-performance KV storage engine

    BitalosDB is a distributed, high-performance key-value database designed for cloud-native applications. It is optimized for scalability, supporting large workloads while maintaining low latency and high availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    getty

    getty

    Asynchronous network I/O library

    Getty is an asynchronous network I/O library developed in Golang. It operates on TCP, UDP, and WebSocket network protocols, providing a consistent interface EventListener. Within Getty, each connection (session) involves two separate goroutines. One handles the reading of TCP streams, UDP packets, or WebSocket packages, while the other manages the logic processing and writes responses into the network write buffer. If your logic processing might take a considerable amount of time, it's...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    encoding

    encoding

    Go package containing implementations of efficient encoding

    Go package containing implementations of encoders and decoders for various data formats. At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Fast JSON

    Fast JSON

    Fast JSON parser and validator for Go

    ...Overall, it is a specialized tool for developers who need fine-grained control over JSON processing performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Argo Workflows

    Argo Workflows

    Workflow engine for Kubernetes

    ...Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a directed acyclic graph (DAG). Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products. Argo Workflows is the most popular workflow execution engine for Kubernetes. It can run 1000s of workflows a day, each with 1000s of concurrent tasks. Our users say it is lighter-weight, faster, more powerful, and easier to use. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    ASNmap

    ASNmap

    CLI tool for mapping organization network ranges using ASN data

    ...Output can be generated in multiple formats including plain text, JSON, and CSV, enabling flexible data processing and analysis. asnmap also supports reading input from standard input and piping its results directly into other command line tools for chained workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    XLSX

    XLSX

    Go (golang) library for reading and writing XLSX files

    ...These can be used to modify the behavior of the resultant struct, in particular they replace the `…WithRowLimit` variants of those methods with the result of calling `xlsx.RowLimit` and they add the ability to define a custom backing store for the spreadsheet data to be held in whilst processing. The full API docs can be viewed using go’s built in documentation tool.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Grafana Agent

    Grafana Agent

    Vendor-neutral programmable observability pipelines

    Grafana Agent is an OpenTelemetry Collector distribution with a configuration inspired by Terraform. It is designed to be flexible, performant, and compatible with multiple ecosystems such as Prometheus and OpenTelemetry. Grafana Agent is based on components. Components are wired together to form programmable observability pipelines for telemetry collection, processing, and delivery.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo