Using ChoiceMaker

Help
2011-08-09
2013-04-25
  • Robert Whitcomb

    Robert Whitcomb - 2011-08-09

    Hi Rick,

    A coworker and myself are looking to use ChoiceMaker to deduplicate and link records that are stored in CSV files.  We've scoured the documentation as well as some of the source code, and it's not clear to us how to create a schema that related to a flat file.  Can you give a hand here, maybe point us at some specific documentation?  Also, to perform this task, we're assuming that CM Developer isn't available, so we'd need to craft the various files needed by Analyzer and Server by hand, correct?

    Thanks!

    -Del

     
  • Rick Hall

    Rick Hall - 2011-08-09

    Hi Del -

    I've just added some example record source and record-pair source files to CVS under the example project:

      model_projects/simple_person_matching/etc/traindata/rs/flafile-01.rs
      model_projects/simple_person_matching/etc/traindata/rs/flafile-01.txt

      model_projects/simple_person_matching/etc/traindata/mrps/flafile-01.mrps
      model_projects/simple_person_matching/etc/traindata/mrps/flafile-01.txt

    The files contain exactly the same data as the XML-based sources in the same directory. They use exactly the same schema, SimplePersonRecords.schema, without any alteration. (If you needed to work with fixed-width fields, rather than delimited fields, then you would need to modify the schema. If you need fixed-width fields, let me know, and I'll provide an example. There are also tools - albeit poorly documented - that can be used to convert data between XML and CSV sources, and between database records and text files. Again, let me know if these conversion tools are a pressing need, and I'll provide documentation.)

    In other words, you might not need to do very much to work with your CSV data. Let me know if you have questions or if something is not clear after you've looked at the new example sources.

    Regarding CM Developer vs CM Analyzer, these aren't separate applications anymore. It used to be that CM Developer was CM Analyzer plus a special license that unlocked the ChoiceMaker compiler. A special license is no longer required, so CM Developer has quietly gone away.

    • Rick
     
  • Robert Whitcomb

    Robert Whitcomb - 2011-08-10

    Rick,

    Thanks very much for your reply!

    • Del
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks