Implementation of various string similarity and distance algorithms
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity. A library implementing different string similarity and distance measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented.
This is a tool for concern mining which uses a KDM model as input and the output is the same model with annotated concerns. It uses a Concern Library and a modified String Clustering K-means algorithm with Levenshtein metric to cluster the strings.
A simple java library for text and object oriented code.
Among the different available packages, there are for text analysis (levenshtein and ngram fingerprinting), a grammar framework, simple object persistence (very light and dependence free), ...
SimMetrics is a Similarity Metric Library, e.g. from edit distance's (Levenshtein, Gotoh, Jaro etc) to other metrics, (e.g Soundex, Chapman). Work provided by UK Sheffield University funded by (AKT) an IRC sponsored by EPSRC, grant number GR/N15764/01.