lakeFS - Git-like capabilities for your object storage
A lightweight stream processing library for Go
Build, run, and manage data pipelines for integrating data
Backstage is an open platform for building developer portals
Conduit streams data between data stores. Kafka Connect replacement
The open standard for data logging
Privacy and Security focused Segment-alternative, in Golang
Kestra is an infinitely scalable orchestration and scheduling platform
SeaTunnel is a distributed, high-performance data integration platform
A distributed and extensible workflow scheduler platform
Making DAG construction easier
Open-source data observability for analytics engineers
AutoGluon: AutoML for Image, Text, and Tabular Data
Pythonic tool for running machine-learning/high performance workflows
Next-Generation Event Processing Platform
Light-weight, flexible, expressive statistical data testing library
A ranked list of awesome Python open-source libraries
A fast script language for Go
Python module that helps you build complex pipelines of batch jobs
Producer and consumer actors with back-pressure for Elixir
Automated Tool for Optimized Modelling
StarRocks is a next-gen sub-second MPP database for full analytics
Open Source Data Orchestration for the Cloud
Real-time, incremental ETL library for ML with record-level depend
Code review for data in dbt