Is there any information on what method is best for matching healthcare providers data:
1) Doctor First and last name
2) Practice address
3) Hospital name
1) this depends on how clean the data is, is it just transliteration and occasional spelling errors or is it first name and last name switched, is there middle names and titles in some versions or is it always set to be first_name space last_name? are the names international (i expect so)
a base expectation of international names and just transliteration and spelling errors then i would recomend smithwatermangotoh
2) MongeElkan perhaps but again it is hard to say without data examples
3) JaroWinkler or SmithWatermanGotoh seem best to cope with what i would expect minor deviations spelling added hyphens capitals etc
Out of interest you may be interested in the following paper
Jaro, M. A. 1995 "Probabilistic linkage of large public health data file" Statistics in Medicine 14:491-498
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.