In stringcmp.py, jaro halflen is currently set to:
halflen = max(len1,len2) / 2 + 1
According to http://en.wikipedia.org/wiki/Jaro–Winkler_distance, it should be:
halflen = max(len1,len2) / 2 - 1
With test case "chunkumwong" and "ckwong", both python-Levenshtein and lingpipe return 0.4797979797979798, while febrl returns 0.7373737373737372. Changing the plus to minus will make febrl returns the same score.
Note that python-Levenshtein also has a bug in its halflen calculation:
https://github.com/miohtama/python-Levenshtein/issues/1
Just found this line in comparison.py:
halflen = max(len1,len2) / 2 -1 # Or + 1 ?? PC 3/11/2006
It looks like comparison.py and stringcmp.py each has an implementation for jaro-winkler...
This has been fixed, thanks.