Deequ
Deequ is a library built on top of Apache Spark
Deequ is a library built by Amazon (AWS Labs) on top of Apache Spark to enable automated data quality testing, constraint verification, and anomaly detection at scale. It lets users define assertions or constraints about data (e.g. completeness, uniqueness, min / max, correlations etc.), run metrics, verify that data meets expectations, suggest constraints, detect drift/anomaly, and integrate into data pipelines so that bad data is caught early before feeding downstream systems or ML.