As a note to self (and others):
I've experienced large differences in speed when calculating similarities between large amounts of words given as a --file option to similarity.pl. In my case the every words was compared to every other word, e.g. of the file:
Dont know why, but it is much faster if you sort the file according to the second field (sort +1 -2 inputfile > outputfile).
This is a fascinating observation, and off the top of my head I can't explain this behavior...but, we'll check into it!
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.