Levinstein ignores the words order

Guy El
  • Guy El

    Guy El - 2009-09-06

    I am looking for the best metric, that will count the distance between sentences and not a word.
    the sentences will hold not more then 20 words and what I want to achieve is to do levinstein distance but swap words if possible:

    Good day to you all
    to you all day Giid

    in this example the distance should be 2, replacing ii with oo. and it ignores all of the swaps that being made in the words order.

    is it possible?

    • ReverendSam

      ReverendSam - 2009-09-08

      Levenshtein not leinvstein.

      Either way what you need is a matching approach with levenshtein/needlemanWunch used at the token match level. This should do exactly as you need. The easy way to do this is take the source code for ChapmanMatchingSoundex and switch the internal metric (soundex) to instantiate the metric levenshtein. One line of code changed all others should be fine (not tested!). This new metric should then do the required task.


Log in to post a comment.