Browse free open source Data Pipeline tools and projects for Linux below. Use the toggles on the left to filter open source Data Pipeline tools by OS, license, language, programming language, and project status.
Pentaho offers comprehensive data integration and analytics platform.
lakeFS - Git-like capabilities for your object storage
StarRocks is a next-gen sub-second MPP database for full analytics
A ranked list of awesome Python open-source libraries
Privacy and Security focused Segment-alternative, in Golang
Conduit streams data between data stores. Kafka Connect replacement
Light-weight, flexible, expressive statistical data testing library
Next-Generation Event Processing Platform
Real-time, incremental ETL library for ML with record-level depend
Backstage is an open platform for building developer portals
Open-source data observability for analytics engineers
Python module that helps you build complex pipelines of batch jobs
A fast script language for Go
AutoGluon: AutoML for Image, Text, and Tabular Data
Open source annotation and labeling tool for image and video assets
A distributed and extensible workflow scheduler platform
Kestra is an infinitely scalable orchestration and scheduling platform
A lightweight stream processing library for Go
Build, run, and manage data pipelines for integrating data
Making DAG construction easier
The open standard for data logging
Design, automate, operate and publish data pipelines at scale
osDQ dedicated to create apache spark based data pipeline using JSON
Open Source Data Orchestration for the Cloud
SeaTunnel is a distributed, high-performance data integration platform