#81 regexp lazy quantification

Alex Bukanov

This example
set aa {
interface ethernet 1/1
switchport native vlan 5
interface ethernet 1/2
switchport native vlan 334
interface ethernet 1/3
switchport native vlan 4093
interface ethernet 1/4
switchport native vlan 2
! };
regexp "ethernet 1/3.*?native vlan (\[0-9]{1,3})" $aa bb cc;
set cc;
returned one symbol "4"
But, if I don't use lazy modifier ".*?" like this :
regexp "ethernet 1/3.*native vlan (\[0-9]{1,3})" $aa bb cc;
set cc ;
returned correct number "4093"

It is a bug ?

I use expect version on ubuntu linux 10.04


  • Mixing greedy and non-greedy is known to be problematic. There's a rather sibylline warning in re_syntax.n:

    The matching rules for REs containing both normal and non-greedy quan-
    tifiers have changed since early beta-test versions of this package.
    (The new rules are much simpler and cleaner, but do not work as hard at
    guessing the user's real intentions.)

    Independently from this issue, your parsing task is not well handled by one global "blind" regexp. The following input will match but with the wrong semantics:

    interface ethernet 1/3
    ! (missing vlan line)
    interface ethernet 1/4
    switchport native vlan 4093

    So I suggest building a saner, stateful line-by-line parse first. Such DSLs are very simply handled by [eval]:

    proc dsl_interface {x y} {set ::itf $x/$y}
    proc switchport {x y z} {lappend ::sw($::itf) $x $y $z}

    foreach line $input {
    set line [string trimleft $line]
    switch -glob -- $line {
    !* - "" continue
    interface* - switchport* {}
    * {fatal "Syntax error: $line"}
    eval dsl_$line

    #now look at $::sw(ethernet/1/3)