Zingg
Scalable master data management and identity resolution
Zingg is an open-source entity resolution and master data management platform for finding duplicate, related, or matching records across large datasets. It uses machine learning to learn how records should be compared, reducing the need for brittle hand-written matching rules. The project is designed for data engineering and analytics teams working on customer 360, supplier 360, deduplication, fuzzy matching, data quality, and golden record workflows. Zingg runs on Apache Spark and can scale to large data lake, warehouse, and cloud platform environments. It supports configuration-driven pipelines where users define input data, match fields, training data, models, and output destinations. ...