I am looking for the best metric, that will count the distance between sentences and not a word.
the sentences will hold not more then 20 words and what I want to achieve is to do levinstein distance but swap words if possible:
Good day to you all
to you all day Giid
in this example the distance should be 2, replacing ii with oo. and it ignores all of the swaps that being made in the words order.
is it possible?
Levenshtein not leinvstein.
Either way what you need is a matching approach with levenshtein/needlemanWunch used at the token match level. This should do exactly as you need. The easy way to do this is take the source code for ChapmanMatchingSoundex and switch the internal metric (soundex) to instantiate the metric levenshtein. One line of code changed all others should be fine (not tested!). This new metric should then do the required task.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.