In comparison.py, there is a formula to calculate partial agreement weight, "modified after formula in Winkler and Thibaudeau, 1991, page 12".
Given agreement weight=10, disagreement weight=0, score=0.85, and threshold=0.8, the formula in Febrl will give:
10 - (1 - 0.85) / (1 - 0.8) * (10 + 0) = 2.5
On the other hand, assuming the field is first name, the formula in the paper (which has fixed thresholds) will give:
10 - (10 - 0) * (1 - 0.85) * 1.5 = 7.75
which is closer to what one would expect for a score of 0.85. The calculated weight from Febrl is just too low.
Anyway, I am not sure what would be a good solution.
Log in to post a comment.