Menu

Home

OmniMark Code

The fuzzy-matching library provides an OmniMark module omfuzzy.xmd with the pattern function fuzzy:

export switch function
   fuzzy               read-only  integer goals
         distance      write-only integer best-distance
         deletion      value      integer deletion-distance      optional
         insertion     value      integer insertion-distance     optional
         replacement   value      integer replacement-distance   optional
         transposition value      integer transposition-distance optional

The function succeeds only if the input prefix approximately matches any of the keys in goals. Each integer value of the goals shelf items determines the allowed Damerau–Levenshtein distance between its key and the input.

Damerau–Levenshtein distance equals the minimum number of character deletions, insertions, replacements, and transpositions required to transform the target string into the input. The cost of each of the four transformations, if allowed, must be explicitly specified using the appropriate argument.

The matching algorithm the library uses is quite naive and unoptimized. It is, however, fully streaming. The library is not restricted by line boundaries or any other record boundaries.

The provided test program demonstrates the use of the function. It can operate in two modes: line-based matching and word-based matching. The latter is default. The program can take an arbitrary number of arguments. Every numeric argument is considered to specify the allowed distance to the following argument. The argument pair becomes one of the goals. All other command-line arguments are treated as input file names.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.