batch free download - SourceForge

Pathway

Python ETL framework for stream processing, real-time analytics, LLM

...Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.

Downloads: 2 This Week

Last Update: 5 days ago

See Project

Best-of Python

A ranked list of awesome Python open-source libraries

...Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.

Downloads: 5 This Week

Last Update: 2 days ago

See Project

Arroyo

Distributed stream processing engine in Rust

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

Downloads: 0 This Week

Last Update: 2025-12-01

See Project

SageMaker Spark Container

Docker image used to run data processing workloads

...It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. The container images in this repository are used to build the pre-built container images that are used when running Spark jobs on Amazon SageMaker using the SageMaker Python SDK. The pre-built images are available in the Amazon Elastic Container Registry (Amazon ECR), and this repository serves as a reference for those wishing to build their own customized Spark containers for use in Amazon SageMaker.

Downloads: 0 This Week

Last Update: 2025-12-04

See Project

Amadeus

Harmonious distributed data analysis in Rust

Amadeus is a high-performance, distributed data processing framework written in Rust, designed to offer an ergonomic and safe alternative to tools like Apache Spark. It provides both streaming and batch capabilities, allowing users to work with real-time and historical data at scale. Thanks to Rust’s memory safety and zero-cost abstractions, Amadeus delivers performance gains while reducing the complexity and bugs common in large-scale data pipelines. It emphasizes developer productivity through a fluent, expressive API and makes it easier to build composable and reliable data transformation pipelines without sacrificing speed or safety.

Downloads: 0 This Week

Last Update: 2025-04-08

See Project

Cosmos DB Spark

Apache Spark Connector for Azure Cosmos DB

...The connector allows you to easily read to and write from Azure Cosmos DB via Apache Spark DataFrames in Python and Scala. It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.

Downloads: 0 This Week

Last Update: 2023-12-21

See Project

Dataflow Java SDK

Google Cloud Dataflow provides a simple, powerful model

The Dataflow Java SDK is the open-source Java library that powers Apache Beam pipelines for Google Cloud Dataflow, a serverless and scalable platform for processing large datasets in both batch and stream modes. This SDK allows developers to write Beam-based pipelines in Java and execute them on Dataflow, taking advantage of features like autoscaling, dynamic work rebalancing, and fault-tolerant distributed processing. While it has been mostly succeeded by the unified Beam SDKs, it remains relevant for legacy systems and offers insight into the underlying mechanisms that power scalable data workflows on Google Cloud.

Downloads: 0 This Week

Last Update: 2025-04-08

See Project

Search Results for "batch"

Showing 7 open source projects for "batch"

Pathway

Best-of Python

Arroyo

SageMaker Spark Container

Amadeus

Cosmos DB Spark

Dataflow Java SDK

Search Results for "batch"

Showing 7 open source projects for "batch"

Pathway

Best-of Python

Arroyo

SageMaker Spark Container

Amadeus

Cosmos DB Spark

Dataflow Java SDK

Related Searches

Related Categories