#1 Jaro impemetation

closed-postponed
ReverendSam
None
5
2007-02-07
2007-01-02
Anonymous
No

I’ve found two thing wrong in the implementation of the jaro algorithm and since i am not familiar with the cvs i thought i should post‘em here.

1) In the computation of distance the line should be
this.Distance = Math.Min(string1.Length, string2.Length) / 2 + Math.Min(string1.Length, string2.Length) % 2; in order to have a proper rounding

2) and to avoid the left vs right distance difference that shows up sometimes we have to edit the following line:
//compare char with range of characters to either side
for (int j = Math.Max (0, i - distance); !foundIt && j <= Math.Min(i + distance, string2.Length - 1 ); j++)

Keep up the good work!

Discussion

  • ReverendSam
    ReverendSam
    2007-01-12

    • status: open --> open-postponed
     
  • ReverendSam
    ReverendSam
    2007-01-12

    Logged In: YES
    user_id=1151038
    Originator: NO

    there is still debate on this issue, of what is correct - although i must admit i tend to agree with these changes some code exists already using these metric scores to change it could invalidate programs - this is intended to be fixed in the next release of simmetrics where it is likely that about 4 versions of jaro will be added in place of the one previous. This then will allow people to choose the implementation of choice.

     
  • ReverendSam
    ReverendSam
    2007-02-07

    • assigned_to: nobody --> reverendsam
    • status: open-postponed --> closed-postponed