In stringcmp.py, jaro halflen is currently set to:
halflen = max(len1,len2) / 2 + 1
According to http://en.wikipedia.org/wiki/Jaro–Winkler_distance, it should be:
halflen = max(len1,len2) / 2 - 1
With test case "chunkumwong" and "ckwong", both python-Levenshtein and lingpipe return 0.4797979797979798, while febrl returns 0.7373737373737372. Changing the plus to minus will make febrl returns the same score.
Log in to post a comment.