First of all, this is a great library for quick string comparison with a large metrics base (including explanations!). I really like it!
For my purpose, I want to 'normalize' namings. Over time some descriptions have been changed/corrected. For example small typo's have been fixed or upper/lower case letters have been changed.
I want to rename all these small variations, taking into account the case of the characaters (e.g. case sensitive comparison).
So far I have tried a couple of metrics but they all seem to ignore the character casing.
What am I doing wrong?
Thanks for any help/hints/tips!
To (partly) answer my own question, I have tested a few (!) metrics.
The following metrics are case sensitive:
The following appear to be case insensitive:
Block distance, Euclidean distance, Jaccard, Cosine,
I'm no expert on simmetrics but the two metrics I have tried until now were QGrams and Levenshtein and they both are case insensitive for me.
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.