Home

BNFA is C++ regular expression matcher based on non-deterministic finite automata (NFA).

BNFA provides a way to construct the regular expression rules, and an engine for comparing input against the regular expressions.

Rules

BNFA uses overloaded operators for writing regular expressions. This syntax was inspired by Boost.Spirit.

ElementRegular expressionBNFA
OptionalA?-A
One-or-moreA++A
Zero-or-moreA**A
Exactly N timesA{N}A(N)
At least N timesA{N,}A(N,infinity)
Between M and N timesA{M,N}A(M,N)
ConcatenationABA >> B
AlternationA | BA | B
Separated listA(?:BA)*A % B
Capture(A)capture(A)
Positive lookahead(?=A)&A
Negative lookahead(?!A)!A
Characteratext("a")
Any character.any()
Character sequenceabctext("abc")
Character group[abc]group("abc")
Character range[a-z]range('a', 'z')
Character class[:alnum:]
[:alpha:]
...
alnum()
alpha()

...

Examples

  • Match a single vowel:



    rule vowel = group("AEIOUYaeiouy");

  • Match a one or more vowels:



    rule vowels = +vowel;

  • Match identifier (alphabetic character followed by zero or more alphanumeric characters):



    rule identifier1 = alpha() >> *alnum(); // Alternative #1

    rule identifier2 = !digit() >> +alnum(); // Alternative #2

  • Matching a floating point number can be done with the regular expression [+-]?([0-9]*\.[0-9]+|[0-9]+):



    rule floating = -group("+-") >> ( ( *digit() >> text(".") >> +digit() ) | +digit() );

Matching Policies

The BNFA engine supports three different matching policies:

  • Match - Examines if the input matched the rule.
  • Capture - Returns a list with parts of the input that matches the captures specified in the rule.
  • Lookup - Look up data that matches the input. This works like a radix tree, but where the radix tree only handles prefixes, this matcher handles arbitrary regular expressions.