nightfly - 2005-07-01


Setting up a new cancer registry I got the permission to use another registries database (same data structure). Only the linkage software is missing (they use Automatch, which is not availiable any more).

In my search for alternatives I tested Febrl and found it rather easy to configure even with very small python knowledge. Using dummy configurations for blocking and deduplication/matching it took me about two days to get the depupe and linkage scripts running (data is already standardized; names and day of birth encrypted) .

The second step will be the adjustment of  blocking and matching criteria:

From the other registry I have a complete set of Automatch config files (8 undup and 8 match files with each 8 passes (only the first pass differs between the files). They use some arrays for matching. Example:

Undup-File1,  Pass1
BLOCK1 CHAR Lastname
BLOCK1 CHAR Firstname
MATCH1 ARRAY CHAR Lastname    0.95 0.01
MATCH1 ARRAY CHAR Firstname 0.95 0.01
MATCH1 ARRAY CHAR Birthname 0.95 0.01
;MATCH1 ARRAY CHAR Prevname 0.95 0.01
MATCH1 CHAR Sex     0.98 0.5
MATCH1 CHAR DOB    0.98 0.03
MATCH1 CHAR MOB    0.98 0.08
MATCH1 CHAR JOB    0.98 0.02
MATCH1 CHAR Region   0.95 0.01

Is it possible to specify the same matching criteria in febrl?

If yes, Is there any conversion help availiable to transform automatch scripts in febrl

Best wishes

Stefan Gawrich
Cancer Registry of Hesse