Metadata/data identification Java library. Identifies Base Type (e.g. Boolean, Double, Long, String, LocalDate, LocalTime, ...) and Semantic Type information (e.g. Gender, Age, Color, Country, ...). Extensive country/language support. Extensible via user-defined plugins. Comprehensive Profiling support. Large set of built-in Semantic Types (extensible via JSON defined plugins). Extensive Profiling metrics (e.g. Min, Max, Distinct, signatures, …) Sufficiently fast to be used inline. See Speed notes below. Minimal false positives for Semantic type detection. See Performance notes below. Usable in either Streaming, Bulk or Record mode. Broad country/language support - including US, Canada, Mexico, Brazil, UK, Australia, much of Europe, Japan and China. Support for sharded analysis (i.e. Analysis results can be merged) Once stream is profiled then subsequent samples can be validated and/or new samples can be generated.
Features
- Large set of built-in Semantic Types (extensible via JSON defined plugins)
- Extensive Profiling metrics (e.g. Min, Max, Distinct, signatures, …)
- Minimal false positives for Semantic type detection
- Usable in either Streaming, Bulk or Record mode
- Broad country/language support - including US, Canada, Mexico, Brazil, UK, Australia, much of Europe, Japan and China
- Support for sharded analysis (i.e. Analysis results can be merged)