Wich Metric Combination

  • Tom

    Tom - 2007-06-23


    Please excuse my english, i'm from germany...

    i'm looking for a metric to search for duplicate data in a database...


    String1: firma maier & co KG
    String2: KG maier firma & co KG

    i need a minimum of 90 percent match of these two strings.

    the stigs are very similar, only the order is different, and the second string contains one more "KG"

    any ideas?

    Best regards Tom

    • ReverendSam

      ReverendSam - 2007-07-12

      If the tokens are always identical but the order is irrevevant then a cosine metric maybe a good idea as the matching of terms should be exact and the distance from a perfect match is only from terms present in one string and not the other.

      The metric could also be modified to ignore duplicate terms "KG" (this simplifies the algorithm actually).

      good luck


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks