A Data Generator


A tool to generate synthetic test data useful to Record matchers

Add a Review
0 Downloads (This Week)
Last Update:
  Browse Code SVN Repository



With growing amount of information from multiple sources it has become very hard to relate information to the correct real life entities. Record matching software try to solve this by machine learning techniques. To do this effectively, its necessary to train the record matcher with proper test data which is identical to real life data. Hence, there is a need for a data generator to create the synthetic data to be used for evaluating the quality and capability of record matching software.

A data generator creates qualitative test data considering various the real life data glitches entered through various means like human data entry, voice dictation and data scanning. The data generation process is done in many steps like org data creation, data grouping, pair generation, data mutation and matching data patterns. Data generator also mangles field values of generated test data to achieve data errors and co-relate them in real life contexts like Family, Households, Organizations etc

A Data Generator Web Site


  • Ability to generate Match, Hold and Differ record pairs
  • Support for various types of data mutatotions
  • Ability to co-relate data records into real life contexts like family, household, organizations
  • Intra group and inter group pair generation
  • Attribute dependancy support


Write a Review

User Reviews

Be the first to post a review of A Data Generator !

Additional Project Details



Intended Audience

Information Technology, Science/Research

User Interface


Programming Language



Screenshots can attract more users to your project.
Features can attract more users to your project.