Thanks to Tasos, you can now easily add SimMetrics into MS SQL direct into queries.
for more details,
Thanks again Tasos for the good work and documentation on this useful addition.
This previosuly in rare cases could of returned negative scores (multiple repeating tokens in the string). I.e. only a problem in a limited and repeated lexicon.
This issue is now resolved in newly released V1.6
Version 1.6 fixed a tokeniser bug which can incorrectly tokenise in the case of repeating whitespace delimiting charcters. This inadvertantly adjusted token based metric scores in a few rare cases.
version 1.6 of simmetrics has been released.
added JUNIT test classes for all metrics and component parts.
fixes to a whitespace tokeniser bugs which can influence metric scores adversely in rare cases where repeating whitespace delimiting characters are found.
adjusted the method for the returning the normalised score function for the euclidean distance simmetric (this gave incorrect scores with comparison text containing multiple repeating terms)
CVS source control has now been added, you can always get the latest version of the code this way.
another update to the .NET version. This removes the java support class created by the MS Java to C# converted. I have added in a new TokenUtilities class to provide the required functionality. There are also new tokenisers for qgram2, 3 and s as well as 2, 3 and s extended.
SGrams allow for errors in the words using a cci index.
v.1.1 .NET update.
This update cleans up the method and class names as per FxCop rules.
Lots of methods have been converted to properties, directory structures changed, unused source placed in separate directories.
Unit tests have been added and generics where appropriate.
Another update - where the Java Support class has been removed - will be released shortly.
Updated to new version 2 new metric inclusions and other minor changes.
version 1.4 is now released
now includes 1.5 optermisations
minor bug fixes
Version 1.1 has been released - this includes an addition to all metrics and the abstracted interface whereby the output of the underlying algorithms can be returned un-normalised as well as supporting the previous normalised output.
More news will follow shortly.
I have added a LOW TRAFFIC mailing list about major releases and developments in the SimMetric Library, if you want to subscribe to this mailing list then simply send a mail to
with the subject set to
you can unsubscribe at any time by mailing the same email, with the subject UNSUBSCRIBE.
The JavaDocs will be updated online at the following address http://www.dcs.shef.ac.uk/~sam/simmetrics/
let me know of any problems
The first open source version of simmetrics is now released. Comments on usage, problems etc would be appreciated.
Reverend Sam Chapman email@example.com