This forum is set up to discuss the best metrics or approaches to use in any number of given situations, if you are unsure of the ideal metric for any task (biological RNA comparison to plagurism detection) please ask here and someone interested in the project will try to start a new thread with the Subject describing the data type requiring a similarity test.
I'm considering using this utility, in conjunction with adress normalization software, to find matching, deliberately obfuscated adresses in a database. For instance, 1 main street / one main st. / 1 maine / etc...
it sounds like a classic usage of simmetrics to me, similar work has been done using similar techniques since the 70's merge/purge etc. It has been seen to be highly successful in such tasks.
Log in to post a comment.