Browse free open source Data Pipeline tools and projects for Linux below. Use the toggles on the left to filter open source Data Pipeline tools by OS, license, language, programming language, and project status.
Pentaho offers comprehensive data integration and analytics platform.
A ranked list of awesome Python open-source libraries
Kestra is an infinitely scalable orchestration and scheduling platform
A distributed and extensible workflow scheduler platform
Backstage is an open platform for building developer portals
Privacy and Security focused Segment-alternative, in Golang
Open Source Data Orchestration for the Cloud
Automated Tool for Optimized Modelling
Open-source data observability for analytics engineers
Build, run, and manage data pipelines for integrating data
StarRocks is a next-gen sub-second MPP database for full analytics
lakeFS - Git-like capabilities for your object storage
Real-time, incremental ETL library for ML with record-level depend
Design, automate, operate and publish data pipelines at scale
osDQ dedicated to create apache spark based data pipeline using JSON
Mirror of Apache Kafka
SeaTunnel is a distributed, high-performance data integration platform
AutoGluon: AutoML for Image, Text, and Tabular Data
BitSail is a distributed high-performance data integration engine
A FITS image data viewer & reducer, and UVIT Data Reduction Pipeline.
Conduit streams data between data stores. Kafka Connect replacement
Pythonic tool for running machine-learning/high performance workflows
Use SQL to build ELT pipelines on a data lakehouse
Open source annotation and labeling tool for image and video assets
Producer and consumer actors with back-pressure for Elixir