Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


Tree [530764] default /

File Date Author Commit
bench 2013-12-09 Victor Victor [15a3be] merged from devel
site 2013-12-09 Victor Victor [530764] minor fixes
src 2013-12-09 Victor Victor [15a3be] merged from devel
t 2013-12-09 Victor Victor [15a3be] merged from devel
.hgtags 2013-12-09 Victor Victor [ff389a] tagged rel-09dec2013
LICENSE 2013-05-19 Victor Victor [cebb99] sync default to the latest devel branch
README 2013-12-09 Victor Victor [15a3be] merged from devel
cl-string-match-test.asd 2013-12-09 Victor Victor [15a3be] merged from devel
cl-string-match.asd 2013-12-09 Victor Victor [15a3be] merged from devel

Read Me

CL-STRING-MATCH aims at providing robust implementations of string
matching algorithms. These algorithms are also called "substring
search" or "subsequence search" algorithms.

Corresponding article on Wikipedia is:


Currently it provides implementations of the following string matching

* Brute-force (also known as naïve algorithm)
* Boyer-Moore (with mismatched character heuristic)
* Boyer-Moore-Horspool algorithm
* Rabin-Karp algorithm
* Knuth-Morris-Pratt algorithm
* Aho-Corasick algorithm (with finite set of patterns)

Some string processing algorithms are also implemented:

* Simple (naїve) suffix tree construction algorithm
* Ukkonen's suffix tree construction algorithm

Data structures:

* Prefix trie
* Suffix tree

Project home page:


Also take a look at the project Wiki:



The project still lacks some important features and is under
development. Following tasks are still to be implemented:

* Comprehensive unit test suite: test if the functions in this package
  work properly

* Better replication of the SEARCH function parameters, implement
  search on generic sequences where possible, not just on
  strings. Search on byte sequences/arrays should be possible.

* Improve performance: some implementations (i.e. Aho-Corasick,
  Rabin-Karp) are extremely slow compared with theoretical boundaries.

* Benchmark should include corner cases (general worst-case scenarios)
  and also check correlation between needle and haystack sizes

* It is possible to generate code for automata and compile it into
  native code using standard Lisp facilities [as described by Peter

* Utilize random haystack for benchmarking similar to CL-IRREGSEXP

* Finite-State automaton algorithm

* Apostolico–Giancarlo algorithm

Algorithms with finite set of patterns:

* Commentz-Walter algorithm

* Rabin–Karp string search algorithm

* DAWG-match

Additional algorithms:

* Suffix tree and tree-based algorithms

* Thompson NFA

* Consider inexact matching and other algorithms on strings

;; EOF