Metadata/data identification Java library. Identifies Base Type (e.g. Boolean, Double, Long, String, LocalDate, LocalTime, ...) and Semantic Type information (e.g. Gender, Age, Color, Country, ...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support. Large set of built-in Semantic Types (extensible via JSON defined plugins).
Data quality analysis, profiling, cleansing, duplicate detection +more
DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging.
Website: http://datacleaner.github.io