DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging.

Website: http://datacleaner.github.io

Features

  • Profiles and analyzes your database within minutes!
  • Access almost any datastore - Oracle, MySQL, PostgreSQL, MS SQL Server, MongoDB, CUBRID, CSV files, Excel spreadsheets, dbase and more
  • Discover patterns in your textual data with the Pattern Finder
  • Find out which values occur the most with the Value Distribution profile
  • Cleanse your contact details with name and address validations
  • Detect duplicates using fuzzy logic and configurable weights and thresholds
  • Merge your duplicates and create a single version of the truth
  • Write data back to relational databases, CSV files, Excel spreadsheets or MongoDB databases

Project Samples

Project Activity

See All Activity >

License

GNU Library or Lesser General Public License version 3.0 (LGPLv3)

Follow DataCleaner

DataCleaner Web Site

nel_h2
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Ratings

★★★★★
★★★★
★★★
★★
2
1
0
0
0
ease 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 5 / 5
features 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
design 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 4 / 5
support 1 of 5 2 of 5 3 of 5 4 of 5 5 of 5 3 / 5

User Reviews

  • I use Duplicate Files Deleter as it is very effective. Try, It is 100% accurate and performs the scan quickly and cleans properly.
  • Very good tool to work with data profiling and data cleansing
  • Cool!
Read more reviews >

Additional Project Details

Intended Audience

Information Technology, Quality Engineers, Science/Research

User Interface

Java Swing, Web-based

Programming Language

Java

Database Environment

Firebird/InterBase, Flat-file, HSQL, JDBC, Microsoft SQL Server, MySQL, Oracle, Other network-based DBMS, PostgreSQL (pgsql), Project is a database conversion tool, Project is a database management tool, SQLite, XML-based

Related Categories

Java Data Warehousing Software, Java Information Analysis Software, Java Business Intelligence Software, Java Database Management Systems (DBMS), Java Data Quality Tool

Registered

2008-02-09

Find a Partner

Human Inference

Human Inference

Human Inference is the European market leader in data quality solutions. The solutions are based on natural language processing and contain a core of knowledge to provide our customers with the best quality possible.

Neopost Customer Information Management

Neopost Customer Information Management

Neopost Customer Information Management is a set of solutions and services that covers the entire lifecycle of customer information and communication management.

Add-ons & Plugins