data processing free download

Showing 61 open source projects for "data processing"

View related business solutions

Go Clear Filters & Widen Search

AI Agents That Actually Do the Work
Assign real work to AI teammates that know your projects, priorities, and deadlines.

ClickUp's Super Agents run 24/7 inside your workspace: triaging bugs, drafting content, updating statuses, and routing tasks without being told twice. Connect them to 500+ tools and let them execute, not just suggest. Build custom agents in minutes that understand your workflows and act on them autonomously.

Try ClickUp Free
Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
1

Numaflow

Kubernetes-native platform to run massively parallel data/streaming

Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
2

pdfcpu

A PDF processor written in Go

pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...

Downloads: 14 This Week

Last Update: 2026-06-09
See Project
3

Kapacitor

Open source framework for processing, monitoring, and alerting

Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.

Downloads: 0 This Week

Last Update: 2026-05-26
See Project
4

go-streams

A lightweight stream processing library for Go

A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.

Downloads: 0 This Week

Last Update: 2025-05-10
See Project
Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
5

Pachyderm

Data-Centric Pipelines and Data Versioning

...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.

Downloads: 0 This Week

Last Update: 2025-01-15
See Project
6

InfluxDB

The open source time series database

...Time series is currently the fastest growing database category there is, and InfluxDB is here to ensure businesses can keep up. InfluxDB provides infrastructure and application monitoring, IoT monitoring and analytics and more. It has APIs for storing and querying data, processing it in the background for ETL or monitoring and alerting purposes. This data can also be visualized, explored and more to help businesses seize opportunities and make the best decisions. InfluxDB is easy to start and easy to scale. Learn more about it on https://www.influxdata.com/

Downloads: 28 This Week

Last Update: 5 days ago
See Project
7

Hacks

A collection of hacks and one-off scripts

Hacks is a collection of experimental scripts, utilities, and one-off tools created to solve specific problems in security research, data processing, and automation. Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. ...

Downloads: 5 This Week

Last Update: 2026-03-27
See Project
8

SigLens

100x Efficient Log Management than Splunk

Siglens is an open-source signal analysis toolkit designed for processing and visualizing time-series data, commonly used in scientific and engineering applications.

Downloads: 0 This Week

Last Update: 2025-07-25
See Project
9

Miller

Miller is like awk, sed, cut, join, and sort for name-indexed data

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more. Miller operates on key-value-pair data while the...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

Benthos

Fancy stream processing made operationally mundane

Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
11

Watermill

Building event-driven applications the easy way in Go

Go library for building event-driven applications. Our goal was to create a tool that is easy to understand, even by junior developers. It doesn't matter if you want to do Event-driven architecture, CQRS, Event Sourcing or just stream MySQL Binlog to Kafka. Watermill was designed to process hundreds of thousands of messages per second. Every component is built in a way that allows you to configure it for your needs. You can also implement your own middleware for the router. Watermill is...

Downloads: 0 This Week

Last Update: 2026-05-13
See Project
12

Nuclio

High-Performance Serverless event and data processing platform

Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...

Downloads: 2 This Week

Last Update: 7 days ago
See Project
13

paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx

paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate metadata such as tags, titles, and categories automatically. ...

Downloads: 2 This Week

Last Update: 2026-03-19
See Project
14

ojg

Optimized JSON for Go

Optimized JSON for Go is a high-performance parser with a variety of additional JSON tools. OjG is optimized to processing huge data sets where data does not necessarily conform to a fixed structure.

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
15

Neuroglancer

WebGL-based viewer for volumetric data

...The viewer is built with a multi-threaded architecture, separating rendering and data processing to ensure smooth performance even with massive datasets. Extensively used in neuroscience research, Neuroglancer supports integration with tools.

Downloads: 2 This Week

Last Update: 10 hours ago
See Project
16

Bitalosdb

Bitalosdb is a high-performance KV storage engine

BitalosDB is a distributed, high-performance key-value database designed for cloud-native applications. It is optimized for scalability, supporting large workloads while maintaining low latency and high availability.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
17

Bacalhau

Community-driven, simple, yet powerful framework

Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. ...

Downloads: 1 This Week

Last Update: 2026-04-18
See Project
18

getty

Asynchronous network I/O library

Getty is an asynchronous network I/O library developed in Golang. It operates on TCP, UDP, and WebSocket network protocols, providing a consistent interface EventListener. Within Getty, each connection (session) involves two separate goroutines. One handles the reading of TCP streams, UDP packets, or WebSocket packages, while the other manages the logic processing and writes responses into the network write buffer. If your logic processing might take a considerable amount of time, it's...

Downloads: 1 This Week

Last Update: 2025-09-19
See Project
19

encoding

Go package containing implementations of efficient encoding

Go package containing implementations of encoders and decoders for various data formats. At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
20

Fast JSON

Fast JSON parser and validator for Go

...Overall, it is a specialized tool for developers who need fine-grained control over JSON processing performance.

Downloads: 0 This Week

Last Update: 2026-05-09
See Project
21

Argo Workflows

Workflow engine for Kubernetes

...Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a directed acyclic graph (DAG). Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products. Argo Workflows is the most popular workflow execution engine for Kubernetes. It can run 1000s of workflows a day, each with 1000s of concurrent tasks. Our users say it is lighter-weight, faster, more powerful, and easier to use. ...

Downloads: 1 This Week

Last Update: 2026-06-10
See Project
22

ASNmap

CLI tool for mapping organization network ranges using ASN data

...Output can be generated in multiple formats including plain text, JSON, and CSV, enabling flexible data processing and analysis. asnmap also supports reading input from standard input and piping its results directly into other command line tools for chained workflows.

Downloads: 1 This Week

Last Update: 2026-03-08
See Project
23

XLSX

Go (golang) library for reading and writing XLSX files

...These can be used to modify the behavior of the resultant struct, in particular they replace the `…WithRowLimit` variants of those methods with the result of calling `xlsx.RowLimit` and they add the ability to define a custom backing store for the spreadsheet data to be held in whilst processing. The full API docs can be viewed using go’s built in documentation tool.

Downloads: 1 This Week

Last Update: 2025-04-18
See Project
24

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. ...

Downloads: 0 This Week

Last Update: 2025-03-27
See Project
25

Grafana Agent

Vendor-neutral programmable observability pipelines

Grafana Agent is an OpenTelemetry Collector distribution with a configuration inspired by Terraform. It is designed to be flexible, performant, and compatible with multiple ecosystems such as Prometheus and OpenTelemetry. Grafana Agent is based on components. Components are wired together to form programmable observability pipelines for telemetry collection, processing, and delivery.

Downloads: 6 This Week

Last Update: 2025-06-18
See Project