Stream Processing Tools for Linux

View 7 business solutions

Browse free open source Stream Processing tools and projects for Linux below. Use the toggles on the left to filter open source Stream Processing tools by OS, license, language, programming language, and project status.

  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 1
    Akka

    Akka

    Build concurrent, distributed, and resilient message-driven apps

    Build powerful reactive, concurrent, and distributed applications more easily. Akka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala. Actors and Streams let you build systems that scale up, using the resources of a server more efficiently, and out, using multiple servers. Building on the principles of The Reactive Manifesto Akka allows you to write systems that self-heal and stay responsive in the face of failures. Up to 50 million msg/sec on a single machine. Small memory footprint; ~2.5 million actors per GB of heap. Distributed systems without single points of failure. Load balancing and adaptive routing across nodes. Event Sourcing and CQRS with Cluster Sharding. Distributed Data for eventual consistency using CRDTs. Asynchronous non-blocking stream processing with backpressure.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    Benthos

    Benthos

    Fancy stream processing made operationally mundane

    Benthos is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads. It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary, docker image, or serverless function, making it cloud native as heck. Delivery guarantees can be a dodgy subject. Benthos processes and acknowledges messages using an in-process transaction model with no need for any disk persisted state, so when connecting to at-least-once sources and sinks it's able to guarantee at-least-once delivery even in the event of crashes, disk corruption, or other unexpected server faults. This behaviour is the default and free of caveats, which also makes deploying and scaling Benthos much simpler.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Acl

    Acl

    A powerful server and network library, including coroutine

    The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as HTTP/SMTP/ICMP//MQTT/Redis/Memcached/Beanstalk/Handler Socket are implemented in Acl, and the codec library such as XML/JSON/MIME/BASE64/UUCODE/QPCODE/RFC2047/RFC1035, etc., are also included in Acl. Acl also provides a unified abstract interface for popular databases such as Mysql, Postgresql, Sqlite. Using Acl library users can write database applications more easily, quickly and safely.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Auth for GenAI | Auth0 Icon
    Auth for GenAI | Auth0

    Enable AI agents to securely access tools, workflows, and data with fine-grained control and just a few lines of code.

    Easily implement secure login experiences for AI Agents - from interactive chatbots to background workers with Auth0. Auth for GenAI is now available in Developer Preview
    Try free now
  • 5
    Amadeus

    Amadeus

    Harmonious distributed data analysis in Rust

    Amadeus is a high-performance, distributed data processing framework written in Rust, designed to offer an ergonomic and safe alternative to tools like Apache Spark. It provides both streaming and batch capabilities, allowing users to work with real-time and historical data at scale. Thanks to Rust’s memory safety and zero-cost abstractions, Amadeus delivers performance gains while reducing the complexity and bugs common in large-scale data pipelines. It emphasizes developer productivity through a fluent, expressive API and makes it easier to build composable and reliable data transformation pipelines without sacrificing speed or safety.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Lithops

    Lithops

    A multi-cloud framework for big data analytics

    Lithops is an open-source serverless computing framework that enables transparent execution of Python functions across multiple cloud providers and on-prem infrastructure. It abstracts cloud providers like IBM Cloud, AWS, Azure, and Google Cloud into a unified interface and turns your Python functions into scalable, event-driven workloads. Lithops is ideal for data processing, ML inference, and embarrassingly parallel workloads, giving you the power of FaaS (Function-as-a-Service) without vendor lock-in. It also supports hybrid cloud setups, object storage access, and simple integration with Jupyter notebooks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Reactor Core

    Reactor Core

    Non-Blocking Reactive Foundation for the JVM

    Reactor Core is a foundational library for building reactive applications in Java, providing a powerful API for asynchronous, non-blocking programming.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    fluentbit

    fluentbit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

    Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing. Filtering and enrichment to optimize security and minimize cost. The lightweight, asynchronous design optimizes resource usage: CPU, memory, disk I/O, network. No more OOM errors! Integration with all your technology, cloud-native services, containers, streaming processors, and data backends. Fully event-driven design leverages the operating system API for performance and reliability. All operations to collect and deliver data are asynchronous.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and individuals needing a programmable and flexible orchestration solution without the overhead of enterprise systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    SnappyData

    SnappyData

    Memory optimized analytics database, based on Apache Spark

    SnappyData (aka TIBCO ComputeDB) is a distributed, in-memory optimized analytics database. SnappyData delivers high throughput, low latency, and high concurrency for a unified analytics workload. By fusing an in-memory hybrid database inside Apache Spark, it provides analytic query processing, mutability/transactions, access to virtually all big data sources and stream processing all in one unified cluster. One common use case for SnappyData is to provide analytics at interactive speeds over large volumes of data with minimal or no pre-processing of the dataset. For instance, there is no need to often pre-aggregate/reduce or generate cubes over your large data sets for ad-hoc visual analytics. This is made possible by smartly managing data in memory, dynamically generating code using vectorization optimizations, and maximizing the potential of modern multi-core CPUs. SnappyData enables complex processing on large data sets in sub-second timeframes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Watermill

    Watermill

    Building event-driven applications the easy way in Go

    Go library for building event-driven applications. Our goal was to create a tool that is easy to understand, even by junior developers. It doesn't matter if you want to do Event-driven architecture, CQRS, Event Sourcing or just stream MySQL Binlog to Kafka. Watermill was designed to process hundreds of thousands of messages per second. Every component is built in a way that allows you to configure it for your needs. You can also implement your own middleware for the router. Watermill is using proven technologies and has a strong unit and integration tests coverage for critical areas. Watermill is a Go library for working efficiently with message streams. It is intended for building event driven applications, enabling event sourcing, RPC over messages, sagas and basically whatever else comes to your mind. You can use conventional pub/sub implementations like Kafka or RabbitMQ, but also HTTP or MySQL binlog if that fits your use case.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    go-streams

    go-streams

    A lightweight stream processing library for Go

    A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible deduplication strategies, making it ideal for cleaning web-scraped data, language model training datasets, or document archives.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Strings Edit

    Strings Edit

    String editing and formatting library for Ada

    Strings edit is a library that provides I/O facilities for integers, floating-point numbers, Roman numerals, and strings. Both input and output subroutines support string pointers for consequent stream processing. The output can be aligned in a fixed size field with padding. Numeric input can be checked against expected values range to be either saturated or to raise an exception. For floating-point output either relative or absolute output precision can be specified. UTF-8 encoded strings are supported, including wildcard pattern matching, sets and maps of code points, upper/lowercase, and other Unicode categorizations.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    Sed.py is a python module to provide a easy way to do text stream processing. Just like the name of module, it likes to do the work that sed can do. But not in sed's way, it's in Python's way. To use this module, the knowledge of regexp is necessary.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    A production stable Java utility library with convenience methods for string- and stream processing, file handling, XML, XSLTs and XPath, checksums, console formatting, and more. The project is developed by the State and University Library of Denmark
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    activeinsight
    ActiveInsight provides real-time detection and reaction to events and patterns. It is a platform that enables the detection of meaningful events within multiple, high frequency, event streams.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    An experimental CEP (Complex Event Processing) engine. It implements the event stream processing as a library embeddable in C++ and Perl. Since then it has been renamed to Triceps, so please look at the new location https://sourceforge.net/projects/t
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Bytewax

    Bytewax

    Python Stream Processing

    Bytewax is a Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love. Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search data, distinguishing itself from closed, black-box systems. The tool is suitable for developers working on personal knowledge bases, AI search interfaces, or private LLM applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Cosmos DB Spark

    Cosmos DB Spark

    Apache Spark Connector for Azure Cosmos DB

    Azure Cosmos DB Spark is the official connector for Azure CosmosDB and Apache Spark. The connector allows you to easily read to and write from Azure Cosmos DB via Apache Spark DataFrames in Python and Scala. It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DSPatch

    DSPatch

    The Refreshingly Simple C++ Dataflow Framework

    Webite: http://flowbasedprogramming.com DSPatch, pronounced "dispatch", is a powerful C++ dataflow framework. DSPatch is not limited to any particular domain or data type, from reactive programming to stream processing, DSPatch's generic, object-oriented API allows you to create virtually any dataflow system imaginable. *See also:* DSPatcher ( https://github.com/MarcusTomlinson/DSPatcher ): A cross-platform graphical tool for building DSPatch circuits. DSPatchables ( https://github.com/MarcusTomlinson/DSPatchables ): A DSPatch component repository.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.