I am working on a string processing project in which I have to compare a small string lets say a 5 char string to 10MB txt file and I have to repeat the process
for say 1 millon other 5 char strings .
So my question is which metric should I use for best performance and time
This sounds like the ideal job for FAST/BLAST (not currently in SimMetrics, but if someone wants to add it I would be greatfull!!!). This is based typically (not always) upon smithwaterman but is optermised for processing shorter string segments against larger ones (e.g. a gene portion against a entire genome). For the time being I would either try to use FAST/BLAST implementations or utilising an extension of smith waterman.
Hope this helps