The fuzzy-matching library provides an OmniMark module omfuzzy.xmd
with the pattern function fuzzy
:
export switch function fuzzy read-only integer goals distance write-only integer best-distance deletion value integer deletion-distance optional insertion value integer insertion-distance optional replacement value integer replacement-distance optional transposition value integer transposition-distance optional
The function succeeds only if the input prefix approximately matches any of the keys in goals. Each integer
value of the goals shelf items determines the allowed Damerau–Levenshtein distance between its key and the input.
Damerau–Levenshtein distance equals the minimum number of character deletions, insertions, replacements, and transpositions required to transform the target string into the input. The cost of each of the four transformations, if allowed, must be explicitly specified using the appropriate argument.
The matching algorithm the library uses is quite naive and unoptimized. It is, however, fully streaming. The library is not restricted by line boundaries or any other record boundaries.
The provided test program demonstrates the use of the function. It can operate in two modes: line-based matching and word-based matching. The latter is default. The program can take an arbitrary number of arguments. Every numeric argument is considered to specify the allowed distance to the following argument. The argument pair becomes one of the goals. All other command-line arguments are treated as input file names.