Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

Levinstein ignores the words order

Guy El
2009-09-06
2013-04-25
  • Guy El
    Guy El
    2009-09-06

    Hello,
    I am looking for the best metric, that will count the distance between sentences and not a word.
    the sentences will hold not more then 20 words and what I want to achieve is to do levinstein distance but swap words if possible:

    Good day to you all
    to you all day Giid

    in this example the distance should be 2, replacing ii with oo. and it ignores all of the swaps that being made in the words order.

    is it possible?

     
    • ReverendSam
      ReverendSam
      2009-09-08

      Levenshtein not leinvstein.

      Either way what you need is a matching approach with levenshtein/needlemanWunch used at the token match level. This should do exactly as you need. The easy way to do this is take the source code for ChapmanMatchingSoundex and switch the internal metric (soundex) to instantiate the metric levenshtein. One line of code changed all others should be fine (not tested!). This new metric should then do the required task.