This project is dedicated to open source data quality and data management initiatives. Data Quality includes profiling, filtering, governance, similarity check, data enrichment/alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc.
This project will develop high performance data management platform.
- Big data support - HIVE thrift server support
- Format Creation, Format Matching ( Phone, Date, String and Number), Format standardization
- Fuzzy Logic based similarity check, Cardinailty check between tables and files
- Export and import from XML, XLS or CSV format, PDF export
- File Analysis, Regex search, Standardization, DB search
- Complete DB Scan, SQL interface, Data Dictionary, Schema Comparison
- Statistical Analysis, Reporting ( dimension and measure based)
- Pattern Matching , DeDuplication, Case matching, Basket Analysis, Distribution Chart
- Data generation and Data masking features
- Meta Data Information, Reverse engineering of Data Model
- Timeliness analysis , String length analysis
- Mysql, Oracle,Postgres,MS Access, Db2, SQL Server certified
- Ad Hoc reports and Analytics
- Record Match, Linkage added based on fuzzy logic
A newer version has been uploaded.
I am using it for data analysis and cleaning up my customer data. It works great. Thanks for a good product.
Cool product. So many features and so easy to run.