Feathr is a data and AI engineering platform that is widely used in production at LinkedIn for many years and was open sourced in 2022. It is currently a project under LF AI & Data Foundation. Define data and feature transformations based on raw data sources (batch and streaming) using Pythonic APIs. Register transformations by names and get transformed data(features) for various use cases including AI modeling, compliance, go-to-market and more. Share transformations and data(features) across team and company. Feathr is particularly useful in AI modeling where it automatically computes your feature transformations and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.
Features
- Native cloud integration with simplified and scalable architecture
- Battle tested in production for more than 6 years: LinkedIn has been using Feathr in production for over 6 years and backed by a dedicated team
- Scalable with built-in optimizations: Feathr can process billions of rows and PB scale data with built-in optimizations such as bloom filters and salted joins
- Rich transformation APIs including time-based aggregations, sliding window joins, look-up features, all with point-in-time correctness for AI
- Pythonic APIs and highly customizable user-defined functions (UDFs) with native PySpark and Spark SQL support to lower the learning curve for all data scientists
- Unified data transformation API works in offline batch, streaming, and online environments