## #16 formula to calculate partial agreement weight

5
2011-12-14
2011-09-01
Sok Ann Yap
In comparison.py, there is a formula to calculate partial agreement weight, "modified after formula in Winkler and Thibaudeau, 1991, page 12".

Given agreement weight=10, disagreement weight=0, score=0.85, and threshold=0.8, the formula in Febrl will give:

10 - (1 - 0.85) / (1 - 0.8) * (10 + 0) = 2.5

On the other hand, assuming the field is first name, the formula in the paper (which has fixed thresholds) will give:

10 - (10 - 0) * (1 - 0.85) * 1.5 = 7.75

which is closer to what one would expect for a score of 0.85. The calculated weight from Febrl is just too low.

Anyway, I am not sure what would be a good solution.

## Discussion

• Peter Christen - 2011-12-14

With a threshold set to 0.8, all weights below 8 (10*0.8) will be set to 0. Therefore it makes sense to have partial weight of 2.5 for a similarity of 0.85. A similarity of 0.9 would get a weight of 5, and a similarity of 0.95 a weight of 7.5.

i hope this makes sense.. ;-)

