This is probably a silly question, but I'm very new to all of this and I need some guidance. Any help would be most appreciated!
I have a deduplication project where some of the records have a health index. I want the records which have the same health index to be matched together. Not all of the records have this index however, so I also want the other records matched probabilistically by name and date of birth. So far I've tried having the health index field as a Str-Exact comparison, with a very high weighting for agreement, a medium weighting for missing and zero for disagreement (I want agreement to lead to a match and disagreement to lead to a non-match). The rest of the name and date of birth fields use Winkler and Date comparison functions respectively.
My main problem appears to be that the agreement weight is never applied for the health index when records do match -- it is almost invariably the missing weight that gets applied (this is from inspecting the weights output file). It is as if the health index comparison is failing at every point, with a comparison being made to a missing value in each case. I've checked the weights file, and comparisons between two records with the same health index still yield a missing value weighting. This makes me think that the problem doesn't lay with my choice of blocking indexes.
I'm using Febrl 4.01, both through GUI and running the resulting python file from the console.
Does anyone have any ideas?
Thanks for your time,
Could the classifier be causing this?
Have you experimented with a full index?
I had a similar problem - I got 0 weights when strings matched exactly. If you de-select 'use dedup indexing' box on the index page it should fix that problem.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.