I would like to ask you what's your experience with the plethora of metrics you have implmented. For sure later I will read the comparison paper that you cite in your page, but can you give me some more insight about what is good for what? Things like: is there some evidence that some measure (such as Soundex) is the best one for comparing, say, names (maybe taking into account permutation of surname and proper name)?
its depends upon the type of data, I myself learn the best approach for a field but when this is not possible, (no source currently sorry), you can use common sence knowing how the metrics work, see <a href="http://www.dcs.shef.ac.uk/~sam/stringmetrics.html">http://www.dcs.shef.ac.uk/~sam/stringmetrics.html</a>,
for names a number of approaches could be best, <a href="http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#soundex">soundex</a> if very messy data, e.g. historical records.
as where I found with modern names where, they may be listed differently, e.g. <i>Rev Sam Chapman</i>, or <i>Chapman, Sam</i>. then a <a href="http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#monge">Monge Elkan distance</a> maybe much better suited, (this however has a quadratic time complexity which may make it impractical in some cases).
Hope this helps,
Log in to post a comment.