AutoGluon: AutoML for Image, Text, and Tabular Data
lakeFS - Git-like capabilities for your object storage
Conduit streams data between data stores. Kafka Connect replacement
Pythonic tool for running machine-learning/high performance workflows
Kestra is an infinitely scalable orchestration and scheduling platform
SeaTunnel is a distributed, high-performance data integration platform
A lightweight stream processing library for Go
A ranked list of awesome Python open-source libraries
A distributed and extensible workflow scheduler platform
Privacy and Security focused Segment-alternative, in Golang
Build, run, and manage data pipelines for integrating data
Backstage is an open platform for building developer portals
StarRocks is a next-gen sub-second MPP database for full analytics
The open standard for data logging
Light-weight, flexible, expressive statistical data testing library
Making DAG construction easier
A fast script language for Go
Open-source data observability for analytics engineers
Producer and consumer actors with back-pressure for Elixir
Automated Tool for Optimized Modelling
Python module that helps you build complex pipelines of batch jobs
Next-Generation Event Processing Platform
Open Source Data Orchestration for the Cloud
Pentaho offers comprehensive data integration and analytics platform.
Real-time, incremental ETL library for ML with record-level depend